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COMPOSITIONS AND METHODS FOR TAXOL BIOSYOTHESIS 

CROSS REFERENCE TO RELATED CASE 
This application claims the benefit of U.S. Provisional Application No. 60/015,993, filed 
April 15, 1996. incorporated herein by reference. 

TECHNICAL FIELD 

This invention is related to the field of detection of diierpenoid biosynthesis, particularly to 
the biosynthesis of taxoid compounds such as Taxol . 

ACKNO WLEDGMENT OF GOVERNMENT SUPPORT 
This invention was made with government support under National Institutes of Health Grant 
No. CA-55254. The government has cenain rights in this invention. 

BACKGROUND ART 

The highly functional ized diierpenoid Taxol (Wani et al.^J. Am. Chem. Soc. 93:2325-2327. 
1971) is well-established as a potent chemotherapeutic agent (Holmes ei aL, in Taxane Anticancer 
Agents: Basic Science and Current Status, Georg et al. , eds.» pp. 31-57, American Chemical Society. 
Washington, DC, 1995; Arbuck and Blaylock, in Taxol: Science and Applications. Suffness; ed., 
pp. 379-415, CRC Press, Boca Raton, FL. 1995). (Paclitaxel is the generic name for Taxol. a 
registered trademark of Bristol-Myers Squibb.) 

The supply of Taxol from the original source, the bark of the Pacific yew {Taxus brevifolia 
Nutt.; Taxaceae) is limited. As a result, there have been intensive efforts to develop alternate means 
of production, including isolation from the foliage and other renewable tissues of plantation grown 
25 Taxus species, biosynthesis in tissue culture systems, and semisynthesis of Taxol and its analogs from 
advanced taxane diierpenoid (taxoid) metabolites that are more readily available (Cragg et aL , J. Nat. 
Prod. 56:1657-1668, 1993). Total synthesis of Taxol. at present, is not commercially viable 
(Borman, Chem. Eng. News 72(7):32.34. 1994), and it is clear that in the foreseeable future the 
supply of Taxol and its synthetically useful progenitors must rely on biological methods of 
production, either in Taxus plants or in cell cultures derived therefrom (Suffness. in Taxane 
Anncancer Agents: Basic Science and Current Status, Georg et ai, eds.. American Chemical 
Society. Washington, DC, 1995, pp. M7). 

The biosynthesis of Taxol involves the initial cyclization of geranyigeranyl diphosphate, the 
universal precursor of diterpenoids (West, in Biosynthesis of Isoprenoid Compounds, Porter and 
35 Spurgeon, eds., vol. 1, pp. 375-41 1, Wiley & Sons. New York. NY. 1981), to 

taxa-4(5),ll(12)-diene (Koepp et aL. 7. Biol. Chem. 270:8686-8690, 1995) followed by extensive 
oxidative modification of this olefin (Koepp et al.^J. Biol, Chem. 270:8686-8690, 1995; Croteau et 
aL. in Taxane Anticancer Agents: Basic Science and Current Status, Georg et aL. eds.. pp. 72-80. 



20 
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American Chemicai Society. Washington, DC, 1995) and elaboration of the side chains (FIG. i) 
(Floss and Mocek, in Taxol: Science and Applications, Suffness, ed., pp. 191-208, CRC Press, Boca 
Raton, FL, 1995). 

Taxa-4(5),ll(12)-diene synthase (*'taxadiene synthase"), the enzyme responsible for the 
5 initial cyclization of geranylgeranyl diphosphate, to delineate the taxane skeleton, has been isolated 
from T. brevifolia stem tissue, partially purified, and characterized (Hezari et aL, Arch, Biochem. 
Biophys. 322:437-444, 1995). 

Although taxadiene synthase resembles other plant terpenoid cyclases in general enzymatic 
propenies (Hezari et qL, Arch. Biochem. Biophys. 322:437-444, 1995), it has proved extremely 
10 difficult to purify in sufficient amounts for antibody preparation or microsequencing, thwaning this 
approach toward cDNA cloning. 

SUMMARY OF THE INVENTION 
We have cloned and sequenced the taxadiene synthase gene of Pacific yew. 
15 One embodiment of the invention includes isolated polynucleotides comprising at least 15 

consecutive nucleotides, preferably at least 20, more preferably at least 25, and most preferably at 
least 30 consecutive nucleotides of a native taxadiene synthase gene, e.g. , the taxadiene synthase gene 
of Pacific yew. Such polynucleotides are useful, for example, as probes and primers for obtaining 
homologs of the taxadiene synthase gene of Pacific yew by, for example, contacting a nucleic acid 
20 of a taxoid-producing organism with such a probe or primer under stringent hybridization conditions 
to permit the probe or primer to hybridize to a taxadiene synthase gene of the organism, then 
isolating the taxadiene synthase gene of the organism to which the probe or primer hybridizes. 

Another embodiment of the invention includes isolated polynucleotides comprising a 
sequence that encodes a polypeptide having taxadiene synthase biological activity. Preferably, the 
25 polypcptide-encoding sequence has at least 70 % , preferably at least 80 % , and more preferably at least 
90% nucleotide sequence similarity with a native Pacific yew taxadiene synthase polynucleotide gene. 

In preferred embodiments of such polynucleotides, the polypeptide-cncoding sequence 
encodes a polypeptide having only conservative amino acid substitutions to the native Pacific yew 
taxadiene synthase polypeptide, except, in some embodiments, for amino acid substitutions at one or 
30 more of: cysteine residues 329, 650, 719, and 777; histidine residues 370, 415, 579, and 793; a 
DDXXD motif; a DXXDD motif; a conserved arginine; and a RWWK element. Preferably, the 
encoded polypeptide has only conservative amino acid substitutions to or is completely homologous 
with the native Pacific yew taxadiene synthase polypeptide. In addition, the encoded polypeptide 
preferably lacks at least part of the transit peptide. Also included are ceils, particularly plant cells, 
35 and transgenic plants that include such polynucleotides and the encoded polypeptides. 

Another embodiment of the invention includes isolated polypeptides having taxadiene 
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synthase activity, preferably having at least 70%. more preferably at least 80%, and most preferably 
at least 90% homology with a native laxadiene syndiase polypeptide. Also included are isolated 
polypeptides that comprise at least 10, preferably at least 20, more preferably at least 30 consecutive 
amino acids of a native Pacific yew taxadiene synthase, and most preferably the mature Pacific yew 
5 taxadiene synthase polypeptide {i.e., lacking only the transit peptide). 

Another embodiment of the invention includes antibodies specific for a native Pacific yew 
taxadiene synthase polypeptide. 

Another embodiment of the invention includes methods of expressing a taxadiene synthase 
polypeptide in a eel!, e.g., a taxoid -producing cell, by culturfng a cell that includes an expressible 
10 polynucleotide encoding a taxadiene synthase polypeptide under conditions suitable for expression of 
the polypeptide, preferably resulting in the production of the taxoid at levels that are higher than 
would be expected from an otherwise similar ceil that lacks the expressible polynucleotide. 

The foregoing and other objects and advantages of the invention will become more apparent 
from the following detailed description and accompanying drawings. 

15 

BRIEF DESCRIPTION OF THE DRAWINGS 
FIG. 1 shows steps in the biosynthesis of Taxol, including the initial cyclization of 
geranylgeranyl diphosphate to taxa-4(5),l l(12)-diene, followed by extensive oxidative modification 
and elaboration of the side chains. 
20 FIG. 2 shows the nucleotide and predicted amino acid sequence of Pacific yew taxadiene 

synthase clone pTb 42.1. The start and stop codons are underlined. The locations of regions 
employed for primer synthesis are double underlined. The DDMAD and DSYDD motifs are in 
boldface. Conserved histidines (H) and cysteines (C) and an RWWK element are indicated by boxes. 
Truncation sites for removal of pan or all of the transit peptide are indicated by a triangle (▼). 

25 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
A homology -based cloning strategy using the polymerase chain reaction (PCR) was employed 
to isolate a cDNA encoding taxadiene synthase. A set of degenerate primers was constructed based 
on consensus sequences of related monoterpene, sesquiterpene, and diterpene cyclases. Two of these 
30 primers amplified a 83 base pair (bp) fragment that was cyclase-like in sequence and that was 
employed as a hybridization probe to screen a cDNA library constructed from poly(A)* RNA 
extracted from Pacific yew stems. Twelve independent clones with insen size in excess of two 
kilobase pairs (kb) were isolated and panially sequenced. 

One of these cDNA isolates was functionally expressed in Escherichia coli, yielding a 
35 protein that was catalytically active in converting geranylgeranyl diphosphate to a diterpene olefin that 
was confirmed to be taxa-4(5), 1 l(12)-diene by combined capillary gas chromatography-mass 
spectrometry (Satterwhite and Croteau, /. Chromatography 452:61-73, 1988). 

The taxa-4(5), 1 1( 12)-diene synthase cDN A sequence specifies an open reading frame of 2586 
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nucleotides. The deduced polypeptide sequence contains 862 amino acid residues and has a molecular 
weight of 98,303, compared to about 79,000 previously determined for the mature native enzyme. 
It therefore appears to be full-length and includes a long presumptive plastidial targeting peptide. 
Sequence comparisons with monoterpene. sesquiterpene, and diterpene cyclases of plant origin 
5 indicate a significant degree of similarity between these enzymes; the taxadiene synthase most closely 
resembles (46% identity, 67% similarity) abietadiene synthase, a diterpene cyclase from grand fir. 

Uses of the Taxadiene Synthase Gene 

Increasing Taxol Biosynthesis in Transformed Cells. The committed step of Taxol 

10 (paclitaxei) biosynthesis is the initial cyclization of geranylgeranyl diphosphate, a ubiquitous 
isoprenoid intermediate, catalyzed by taxadiene synthase, a diterpene cyclase. The product of this 
reaction is the parent olefin with a taxane skeleton, taxa-4(5),il(I2)-diene. For a review of taxoids 
and taxoid biochemistry, see, e.g., Kingston et al., '*The Taxane Diterpenoids/ Progress in the 
Chemistry of Organic Natural Products, vol. 61, Springer Verlag, New York, 1993, pp. 1-206. 

15 The committed cyclization step of the target pathway is a slow step in the extended 

biosynthetic sequence leading to Taxol and related taxoids (Koepp et al., J. Biol. Chem. 270:8686- 
8690, 1995; Hezari et aL, Arch, Biochem, Biophys, 322:437-444, 1995). The yield of Taxol and 
related taxoids (e.g., cephalomannine, baccatins. taxinines, among others) in cells of an organism 
capable of taxoid biosynthesis is increased by the expression in such cells of a recombinant taxadiene 

20 synthase gene. 

This approach to increasing taxoid biosynthesis can be used in any organism that is capable 
of taxoid biosynthesis. Taxol synthesis is known to take place, for example, in the Taxaceae, 
including Taxus species from all over the world (including, but not limited to, T. brevifolia, T. 
baccata. T. x media, T. cuspidata, T, canadensis, and T. chinensis), as well as in cenain 
25 microorganisms. Taxol may also be produced by a fungus. Taxomyces andreanae (Stierlc et al.. 
Science 260:214, 1993). 

Agrobacterium tumefaciens-mcdidXed transformation of Taxus species has been described and 
the resulting callus cultures shown to produce Taxol (Han et al.. Plant Sci. 95:187-196, 1994). 

Taxol can be isolated from cells transformed with the taxadiene synthase gene by 
30 conventional methods. The production of callus and suspension cultures of Taxus, and the isolation 
6f Taxol and related compounds from such cultures, has been described (for example, in Fett-Netto 
et aL, Bio/Technology 10:1572-1575, 1992). 

Biosynthesis of taxoids in microorganisms . As discussed below, taxadiene synthase activity 
was observed in transformed E. coli host cells expressing recombinant taxadiene synthase. Taxadiene 
35 synthase does not require extensive post-translationai modification, as provided, for example, in 
manmialian cells, for enzymatic function. As a result, functional taxadiene synthase can be expressed 
in a wide variety of host cells. 

Geranylgeranyl diphosphate, a substrate of taxadiene synthase, is produced in a wide variety 
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of organisms, including bacteria and yeast that synthesize carotenoid pigments {e.g., Serratia spp, 
and Rhodotorula spp.). Introduction of vectors capable of expressing taxadiene synthase in such 
microorganisms permits the production of large amounts of taxa-4(5), 1 1(12)-diene and related 
compounds having the taxane backbone. The taxane backbone thus produced is useftil as a chemical 
feedstock. Simple laxoids, for example, would be useful as perfume fixatives. 

Cloning taxadiene sy nthase homologs and related genes . The availability of the taxadiene 
synthase gene from Pacific yew makes possible the cloning of homologs of taxadiene synthase from 
other organisms capable of laxoid biosynthesis, particularly Taxus spp. Although the proponion of 
common laxoids varies with the species or cultivar of yew tested, apparently all Taxus species 
synthesize taxoids. including TaxoU to some degree {see. e.g. . Matiina and Palva, J. Environ. Hon, 
10:187-191, 1992; Miller. J. Natural Products 43:425-437. 1980). Taxol may also be produced by 
a fungus. Taxomyces andreanae (Stierie et al.. Science 260:214. 1993). 

A taxadiene synthase gene can be isolated from any organism capable of producing Taxol 
or related taxoids by using primers or probes based on the Pacific yew taxadiene synthase gene 
sequence or antibodies specific for taxadiene synthase by conventional methods. 

Modified forms of taxadiene syntha se Ecne and Dolvpeotide . Knowledge of the taxadiene 
synthase gene sequence permits the modification of the sequence, as described more ftilly below, to 
produce variant forms of the gene and the polypeptide gene product. For example, the»plastidial 
transit peptide can be removed and/or replaced by other transit peptides to allow the gene product 
to be directed to various intracellular companmenis or exponed from a host cell. 

DEFINITIONS AND METHODS 

The following definitions and methods are provided to better define the present invention 
and to guide those of ordinary skill in the an in the practice of the present invention. Definitions of 
common terms in molecular biology may also be found in Rieger et al. , Glossary of Genetics: 
Classical and Molecular, 5th edition. Springer- Verlag, New York. 1991; and Lewin, Genes V, 
Oxford University Press. New York, 1994. 

The term "plant" encompasses any plant and progeny thereof. The term also encompasses 
pans of plants, including seed, cuttings, tubers, fruit, flowers, etc. 

A "reproductive unit" of a plant is any totipotent part or tissue of ihe plant from which one 
can Obtain a progeny of the plant, including, for example, seeds, cuttings, buds, bulbs, somatic 
embryos, cultured cell (e.g., callus or suspension cultures), etc. 

Nucleic Acids 

Nucleic acids (a term used interchangeably with "polynucleotides" herein) that are useftil in 
the practice of the present invention include the isolated taxadiene synthase gene, its homologs in 
other plant species, and fragments and variants thereof. 

The term "taxadiene synthase gene" refers to a nucleic acid that contains a taxa-4(5), 1 1(12)- 
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diene synthase sequence, preferably a nucleic acid that encodes a polypeptide having taxadiene 
synthase enzymatic activity. This term relates primarily to the isolated full-length taxadiene synthase 
cDNA from Pacific yew discussed above and shown in FIG. 2 and the corresponding genomic 
sequence (including flanking or internal sequences operably linked thereto, including regulatory 
5 elements and/or intron sequences). 

This term also encompasses alleles of the taxadiene synthase gene from Pacific yew. 
"Native" . The term native" refers to a naturally-occurring ("wild-type**) nucleic acid or 
polypeptide. 

"Homolog" . A "homoiog" of the taxadiene synthase gene is a gene sequence encoding a 
10 taxadiene synthase isolated from an organism other than Pacific yew. 

"Isolated" . An " isolated" nucleic acid is one that has been substantially separated or purified 
away from other nucleic acid sequences in the cell of the organism in which the nucleic acid naturally 
occurs, i.e. , other chromosomal and extrachromosomal DNA and RNA, by conventional nucleic acid- 
purification methods. The term also embraces recombinant nucleic acids and chemically synthesized 
15 nucleic acids. 

Fragments, probes, and primers . A fragment of a taxadiene synthase nucleic acid according 
to the present invention is a portion of the nucleic acid that is less than full-length and comprises at 
least a minimum length capable of hybridizing specifically with the taxadiene synthase nucleic acid 
of Figure 2 under stringent hybridization conditions. The length of such a fragment is preferably 15- 

20 17 nucleotides or more. 

Nucleic acid probes and primers can be prepared based on the taxadiene synthase gene 
sequence provided in FIG. 2. A "probe" is an isolated DNA or RNA attached to a detectable label 
or reporter molecule, e,g., a radioactive isotope, ligand, chemiluminescent agent, or enzyme. 
"Primers" are isolated nucleic acids, generally DNA oligonucleotides 15 nucleotides or more in 

25 length, that are annealed to a complementary target DNA strand by nucleic acid hybridization to form 
a hybrid between the primer and the target DNA strand, then extended along the target DNA strand 
by a polymerase, e.g., a DNA polymerase. Primer pairs can be used for amplification of a nucleic 
acid sequence, e.g., by the polymerase chain reaction (PGR) or other conventional nucleic-acid 
amplification methods. 

30 Methods for preparing and using probes and primers are described, for example, in 

Sambrook et al. , Molecular Cloning: A Laboratory Manual, 2nd cd., vol. 

1-3, ed. Sambrook et aL, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1989; 
Current Protocols in Molecular Biology, ed. Ausubel et al., Greene Publishing and Wiley- 
Interscience, New York, 1987 (with periodic updates); and Innis et al. , PCR Protocols: A Guide to 
35 Methods and Applications, Academic Press: San Diego, 1990. PCR-primer pairs can be derived 
from a known sequence, for example, by using computer programs intended for that purpose such 
as Primer (Version 0.5, ® 1991, Whitehead Institute for Biomedical Research, Cambridge, MA). 
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Nucleoiide sequence similarity . Nucleotide sequence "similarity" is a measure of the degree 
to which two polynucleotide sequences have identical nucleotide bases at corresponding positions in 
their sequence when optimally aligned (with appropriate nucleotide insertions or deletions). Sequence 
similarity can be determined using sequence analysis software such as the Sequence Analysis 
Software Package of the Genetics Computer Group. University of Wisconsin Biotechnology Center. 
Madison. WI. Preferably, a variant form of a taxadiene synthase polynucleotide has at least 70%, 
more preferably at least 80%. and most preferably at least 90% nucleotide sequence similarity with 
a native taxadiene synthase gene, panicuiarly with a native Pacific yew taxadiene synthase, as 
provided in FIG. 2. 

Operably linked . A first nucleic-acid sequence is "operably" linked with a second nucleic- 
acid sequence when the first nucleic-acid sequence is placed in a functional relationship with the 
second nucleic-acid sequence. For instance, a promoter is operably linked to a coding sequence if 
the promoter affects the transcription or expression of the coding sequence. Generally, operably 
linked DNA sequences are contiguous and, where necessary to join two protein coding regions, in 
IS reading frame. 

"Recombinant" . A "recombinant" nucleic acid is an isolated polypeptide made by an 
anificial combination of two otherwise separated segments of sequence, e.g.. by chemical synthesis 
or by the manipulation of isolated segments of nucleic acids by genetic engineering techniques. 

Techniques for nucleic-acid manipulation are described generally in. for example. Sambrook 
et al. (1989) and Ausubel e, al. (1987. with periodic updates). Methods for chemical synthesis of 
nucleic acids are discussed, for example, in Beaucage and Camithers, Tetra. Uits. 22:1859-1862. 
1981 , and Matteucci e, al. . J. Am. Chem. Soc. 103:3 185. 1981 . Chemical synthesis of nucleic acids 
can be performed, for example, on commercial automated oligonucleotide synthesizers. 

Preparation of recombinant or chemically .wnt h esized nucleir acids: vector, transformation 
hosLcells. Natural or synthetic nucleic acids according to the present invention can be incorporated 
into recombinant nucleic-acid constructs, typically DNA constructs, capable of introduction into and 
replication in a host cell. Such a construct preferably is a vector that includes a replication system 
and sequences that are capable of transcription and translation of a polypeptide-encoding sequence 
in a given host cell. For the practice of the present invention, convemional compositions and 
methods for preparing and using vectors and host cells are employed, as discussed, inter alia, in 
Sambrook et al., 1989, or Ausubel et al., 1987. 

A "transformed" or "transgenic" cell, tissue, organ, or organism is one into which a foreign 
nucleic acid, has been introduced. A "transgenic" or "transformed" cell or organism also includes 
(1) progeny of the cell or organism and (2) progeny produced from a breeding program employing 
a "transgenic- plant as a parent in a cross and exhibiting an altered phenotype resulting from the 
presence of the "transgene. " i.e., the recombinant taxadiene synthase nucleic acid. 

Nucleic-Acid Hybridization "Stringent Condit ions": "Soecinr" The nucleic-acid probes 
and primers of the present invention hybridize under stringent conditions to a target DNA sequence. 



20 
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e.g., to the taxadiene synthase gene. 

The term "stringent conditions'* is functionally defined with regard to the hybridization of 
a nucleic-acid probe to a target nucleic acid {i.e,, to a panicuiar nucleic-acid sequence of interest) 
by the hybridization procedure discussed in Sambrook et qL, 1989, at 9.52-9.55. See also, 
5 Sambrook et al., 1989 at 9,47.9,52, 9.56-9.58; Kanehisa, Nucl. Adds Res. 12:203-213, 1984; and 
Wetmur and Davidson, J. Mol. Biol 31:349-370, 1968. 

Regarding the amplification of a target nucleic- acid sequence {e.g., by PCR) using a 
particular amplification primer pair, stringent conditions are conditions that permit the primer pair 
to hybridize only to the target nucleic-acid sequence to which a primer having the corresponding 
10 wild-type sequence (or its complement) would bind and preferably to produce a unique amplification 
product. 

The term "specific for (a target sequence)" indicates that a probe or primer hybridizes under 
stringent conditions only to the target sequence in a sample comprising the target sequence. 

Nucleic-acid amplification . As used herein, "amplified DNA" refers to the product of 
15 nucleic-acid amplification of a target nucleic-acid sequence. Nucleic-acid amplification can be 
accomplished by any of the various nucleic-acid amplification methods known in the art, including 
the polymerase chain reaction (PCR). A variety of amplification methods are known in the art and 
are described, inter alia, in U.S. Patent Nos. 4,683,195 and 4,683.202 and in PCR Protocols: A 
Guide to Methods and Applications, Innis et ai,, eds.. Academic Press, San Diego. 1990. 
20 Methods of making cDN A clones encoding taxadiene synthase or homoloes thereof . Based 

upon the availability of the taxadiene synthase cDNA as disclosed herein, other taxadiene synthase 
genes (e.g. , alleles and homologs of taxadiene synthase) can be readily obtained from a wide variety 
of plants by cloning methods known in the art. 

One or more primer pairs based on the taxadiene synthase sequence can be used to amplify 
25 such taxadiene synthase genes or their homologs by the polymerase chain reaction (PCR) or other 
conventional amplification methods. Altemativeiy , the disclosed taxadiene synthase cDNA or 
fragments thereof can be used to probe a cDNA or genomic library made from a given plant species 
by conventional methods. 

Cloning of the taxadiene synthase genomic sequence and homologs Thereof . The availability 
30 of the taxadiene synthase cDNA sequence enables those skilled in the an to obtain a genomic clone 
torresponding to the taxadiene synthase cDN A (including the promoter and other regulatory regions 
and intron sequences) and the determination of its nucleotide sequence by conventional methods. 

Virtually all Taxus species synthesize taxoids, including Taxol, to some degree (see, e.g., 
Mattina and Palva, 7. Environ. Hon, 10:187-191, 1992; Miller, 7. Natural Products 43:425-437. 
35 1980). Any organism that produces taxoids would be expected to express a homolog of taxadiene 
synthase. Taxadiene synthase genes can be obtained by hybridization of a Pacific yew taxadiene 
synthase probe to a cDNA or genomic library of a target species. Such a homolog can also be 
obtained by PCR or other amplification method from genomic DNA or RN A of a target species using 
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primers based on ihe laxadiene synthase sequence shown in FIG. 2. Genomic and cDNA libraries 
from yew or other plant species can be prepared by conventional methods. 

Pnmers and probes based on the sequence shown in FIG, 2 can be used to confirm (and. 
if necessary, to correct) the taxadiene synthase sequence by conventional methods. 

5 Nucleotide-ScQuence Variants of taxadiene synthase cDNA and Amino Acid Sequence 

Variants of taxadien e svnthase Protein . Using the nucleotide and the amino-acid sequence of the 
laxadiene synthase protein disclosed herein, those skilled in the an can create DNA molecules and 
polypeptides that have minor variations in their nucleotide or amino acid sequence. 

"Variant" DNA molecules are DNA molecules containing minor changes in the native 

0 taxadiene synthase sequence, changes in which one or more nucleotides of a native taxadiene 
synthase sequence is deleted, added, and/or substituted, preferably while substantially maintaining 
taxadiene synthase activity. Variant DNA molecules can be produced, for example, by standard 
DNA mutagenesis techniques or by chemically synthesizing the variant DNA molecule or a ponion 
thereof. Such variants preferably do not change the reading frame of the protein-coding region of 

5 the nucleic acid and preferably encode a protein having no change, only a minor reduction, or an 
increase in taxadiene synthase biological function. 

Amino-acid substitutions are preferably substitutions of single amino-acid residues, DNA 
insenions are preferably of about 1 to 10 contiguous nucleotides and deletions are preferably of about 
1 to 30 contiguous nucleotides. Insenions and deletions are preferably insenions or deletions from 

0 an end of the protein-coding or non-coding sequence and are preferably made in adjacent base pairs. 
Substitutions, deletions, insenions or any combination thereof can be combined to arrive at^a final 
construct. 

Preferably, variant nucleic acids according to the present invention are ''silent" or 
"conservative" variants. "Silent" variants are variants of a native taxadiene synthase sequence or a 

5 homolog thereof in which there has been a substitution of one or more base pairs but no change in 
the amino-acid sequence of the polypeptide encoded by the sequence. "Conservative" variants are 
variants of the native taxadiene synthase sequence or a homolog thereof in which at least one codon 
in the protein-coding region of the gene has been changed, resulting in a conservative change in one 
or more anuno acid residues of the polypeptide encoded by the nucleic-acid sequence, i.e. , an amino 

) acid substitution. A number of conservative amino acid substitutions are listed below. In addition, 
one or more codons encoding cysteine residues can be substituted for, resulting in a loss of a cysieme 
residue and affecting disulfide linkages in the taxadiene synthase polypeptide. 
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Original Residue 


Conservative Substitutions 




Ala 


Ser 




Arg 


Lys 




Asn 


Gin His 




Asp 


Glu 


10 


Cys 






Gin 


Asn 




Glu 


Asp 




Gly 


Pro 




His 




15 


lie 


Leu, Val 




Leu 


He; Val 




Lys 


Arg; Gin; Glu 




Met 


Leu; He 




Phe 


Met; Leu; Tyr 


20 


Ser 


Thr 




Thr 


Ser 




Trp 


Tyr 




Tyr 


Trp; Phe 




Val 


He; Leu 


25 







Substantial changes in function are made by selecting substitutions that are less conservative 
than those listed above, e.g., causing changes in: (a) the structure of the polypeptide backbone in 
the area of the substitution; (b) the charge or hydrophobicity of the polypeptide at the target site; or 

30 (c) the bulk of an amino acid side chain. Substitutions generally expected to produce the greatest 
changes in protein properties are those in which: (a) a hydrophilic residue, e.g. , seryl or threonyl. 
is substituted for (or by) a hydrophobic residue, e.g. , leucyl, isoleucyl, phenylalanyL valyl or alanyl; 
(b) a cysteine or proline is substituted for (or by) any other residue; (c) a residue having an 
electropositive side chain, e.g. , lysyL arginyl, or histadyl. is substituted for (or by) an electronegative 

35 residue, e.g,, glutamyl or aspartyl; or (d) a residue having a bulky side chain, e.g. , phenylalanine, 
is substituted for (or by) one not having a side chain, e.g,, glycine. 

The taxadiene synthase gene sequence can be modified as follows: 

( 1) To impro ve expression efficiency and redirect the targeting of the expressed polypeptide : 
For expression in non-plant hosts (or to direct the expressed polypeptide to a different intracellular 

40 compartment in a plant host), the native gene sequence can be truncated from the 5' end to remove 
the sequence encoding the plastidial transit peptide of approximately !37 amino acids {i.e., lo 
approximately 138S), leaving the sequence encoding the mature taxadiene synthase polypeptide of 
about 725 amino acids. In addition, one or more codons can be changed, for example, to conform 
the gene to the codon usage bias of the host eel! for improved expression. Enzymatic stability can 

45 be altered by removing or adding one or more cysteine residues, thus removing or adding one or 
more disulfide bonds. 

(2) To alter catalytic efficiency : As discussed below, the aspartate-rich 



ONSDOCfO: <W«i_jB7ae571A1JL> 



wo 97/38571 




PCT/US97/06320 



10 



to play a role in substrate binding, is also present in taxadiene synthase, as is a related DXXDD motif 
(FIG. 2). Histidine and cysteine residues have been implicated at the active sites of several terpenoid 
cyclases of plant origin. Histidines residues 370, 415 and 793 and cysteines at residues 329, 650 and 
777 of taxadiene synthase are conserved among the plant terpenoid cyclase genes. 

One or more conserved histidine and cysteine residues (as discussed below), or semi- 
conserved residues such as conserved cysteine residues (e.g., residues 329, 650, 719, and 777) and 
histidine residues (e.g,, residues 370. 415, 579, and 793), can be mutagenized to alter enzyme 
kinetics. In addition, residues adjacent to these conserved histidine and cysteine residues can also 
be altered to increase the cysteine or histidine content to improve charge stabilization. By increasing 
the aspanate content of the DDXXD and DXXDD motifs (where D is aspartate and X is any amino 
acid), which are likely to be involved in subsirate/imermediate binding, it is also possible to increase 
the enzymatic rate (i.^., the rate-limiting ionization step of the enzymatic reaction). Arginines have 
been implicated in binding or catalysis, and conserved arginine residues arc also good targets for 
mutagenesis. Changing the conserved DDXXD and/or DXXDD motifs (e.g., the aspartate residues 
15 thereoO by conventional site-directed mutagenesis methods to match those of other known enzymes 
can also lead to changes in the kinetics or substrate specificity of taxadiene synthase. Additionally, 
product formation can be altered by mutagenesis of the RWWK element (residues 564 to 567), which 
includes aromatic residues that may play a role in stabilizing carbocationic reaction intermediates. 

To modify substrate utilization : The enzyme, panicularly the active site, can be 
modified to allow the enzyme to bmd shoner (e.g., C^o) or longer (e.g.. C^,) chains than 
geranylgeranyl diphosphate. Substrate size utilization can be altered by increasing or decreasing the 
size of the hydrophobic patches to modify the size of the hydrophobic pocket of the enzyme. Similar 
effects can be achieved by domain swapping. 

To change product outcome: Directed mutagenesis of conserved aspanate and arginine 
25 residues can be used to permit the enzyme to produce different diteipene skeletons with, for example, 
one. two, or three rings. 

See, e,g,, Canee/ aL, Biochemist ry 34:2480-2488, 1995; Joly and Edwards, /. Biol. Chem. 
268:26983-26989, 1993; Marrero etal.^J. Biol. Chem, 267:533-536, 1992; and Song and Poulter, 
Proc. Natl, Acad. Sci. USA 91:3044-3048, 1994). 

Expression of taxadiene synthase nu cleic acids in host cells . DNA constructs incorporating 
a taxadiene synthase gene or fragment thereof according to the present invention preferably place the 
taxadiene synthase protein coding sequence under the control of an operably linked promoter that is 
capable of expression in a host cell Various promoters suitable for expression of heterologous genes 
in plant cells are known in the an, including constitutive promoters, e.g. the cauliflower mosaic virus 
(CaMV) 35S promoter, which is expressed in many plant tissues, organ- or tissue-specific promoters, 
and promoters that are inducible by chemicals such as methyl jasminate, salicylic acid, or safeners. 
for example. A variety of other promoters or other sequences useful in constructing expression 
vectors are available for expression in bacterial, yeast, mammalian, insect, amphibian, avian, or other 
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Nucleic acids attached to a solid support . The nucleic acids of the present invention can be 
free in solution or attached by conventional means to a solid siippon, such as a hybridization 
membrane (e.g., nitrocellulose or nylon), a bead, or other solid supports known in the an. 

Polypeptides 

The term "taxadiene synthase protein" (or polypeptide) refers to a protein encoded by a 
taxadiene synthase gene, including alleles, homologs, and variants thereof, for example. A taxadiene 
synthase polypeptide can be produced by the expression of a recombinant taxadiene synthase nucleic 
acid or be chemically synthesized. Techniques for chemical synthesis of polypeptides are described, 
for example, in Merri field, J. Amer. Chem. Soc, 85:2149-2156, 1963. 

Polvt?eptide sequence identity and similarity . Ordinarily, taxadiene synthase polypeptides 
encompassed by the present invention have at least about 70% amino acid sequence "identity" (or 
homology) compared with a native taxadiene synthase polypeptide, preferably at least about 80% 
identity, and more preferably at least about 90% identity to a native taxadiene synthase polypeptide. 
Preferably, such polypeptides also possess characteristic structural features and biological activity of 
a native taxadiene synthase polypeptide. 

Amino acid sequence ' similarity" is a measure of the degree to which aligned amino acid 
sequences possess identical amino acids or conservative amino acid substitutions at corresponding 
positions. 

A taxadiene synthase "biological activity" includes taxadiene synthase enzymatic activity as 
determined by conventional protocols {e.g., the protocol described in Hezari et aL, Arch, Biochem. 
Biophys. 322:437-444. 1995, incorporated herein by reference). Other biological activities of 
taxadiene synthase include, but are not limited to substrate binding, immunological activity (including 
the capacity to elicit the production of antibodies that are specific for taxadiene synthase), etc. 

Polypeptide identity (homology) or similarity is typically analyzed using sequence analysis 
software such as the Sequence Analysis Software Package of the Genetics Computer Group, 
University of Wisconsin Biotechnology Center, Madison, WI). Polypeptide sequence analysis 
software matches polypeptide sequences using measures of identity assigned to various substitutions, 
deletions, substitutions, and other modifications. 

"Isolated." "Purified. " "Homogeneous" Polypeptides . A polypeptide is "isolated" if it has 
been separated from the cellular components (nucleic acids, lipids, carbohydrates, and other 
polypeptides) that naturally accompany it. Such a polypeptide can also be referred to as "pure" or 
"homogeneous" or "substantially" pure or homogeneous. Thus, a polypeptide which is chemically 
synthesized or recombinant {i.e., the product of the expression of a recombinant nucleic acid, even 
if expressed in a homologous cell type) is considered to be isolated. A monomeric polypeptide is 
isolated when at least 60-90% by weight of a sample is composed of the polypeptide, preferably 95% 
or more, and more preferably more than 99%. Protein purity or homogeneity is indicated, for 
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example, by polyacrylamide gel electrophoresis of a protein sample, followed by visualization of a 
single polypeptide band upon staining the polyacrylamide gel; high pressure liquid chromatography; 
or other conventional methods. 

Protein purification . The polypeptides of the present invention can be purified by any of 
the means icnown in the an. Various methods of protein purification are described, e g,, in Guide 
to Protein Purification, ed. Deutscher, Meth. Enzymol. 185. Academic Press, San Diego, 1990; and 
Scopes, Protein Purification: Principles and Practice. Sprmger Verlag, New York, 1982. 

Variant forms of taxadicne synth ase polvpeptides: labeling . Encompassed by the taxadiene 
synthase polypeptides according to an embodiment of the present invention are variant polypeptides 
in which there have been substitutions, deletions, insenions or other modifications of a native 
taxadiene synthase polypeptide. The variants substantially retain structural and/or biological 
characteristics and are preferably silent or conservative substitutions of one or a small number of 
contiguous amino acid residues. Preferably, such variant polypeptides are at least 70%. more 
preferably at least 80%, and most preferably at least 90% homologous to a native taxadiene synthase 
polypeptide. 

The native taxadiene synthase polypeptide sequence can be modified by conventional 
methods, e.g., by aceiylation, carboxylaiion, phosphorylation, glycosylation, ubiquiiination. and 
labeling, whether accomplished by in vivo or in vitro enzymatic treatment of a taxadiene synthase 
polypeptide or by the synthesis of a taxadiene synthase polypeptide using modified amino acids. 

There are a variety of conventional methods and reagents for labeling polypeptides and 
fragments thereof. Typical labels include radioactive isotopes, ligands or ligand receptors, 
fluorophores, chemiluminesceni agents, and enzymes. Methods for labeling and guidance in the 
choice of labels appropriate for various purposes are discussed, e,g. , in Sambrook et al. (1989) and 
Ausubel et al. (1987 with periodic updates). 

Polypeptide Fragments. The present invention also encompasses fragments of taxadiene 
synthase polypeptides that lack at least one residue of a native full-length taxadiene synthase 
polypeptide yet retain at least one of the biological activities characteristic of taxadiene synthase, e,g. , 
taxadiene synthase enzymatic activity or possession of a characteristic immunological determinant. 
As an additional example, an immunologically active fragment of a taxadiene synthase polypeptide 
is capable of raising taxadiene synthase-specific antibodies in a target immune system {e.g,, murine 
or rabbit) or of competing with taxadiene synthase for binding to taxadicne synthase-specific 
antibodies, and is thus useful in immunoassays for the presence of taxadiene synthase polypeptides 
in a biological sample. Such inununologically active fragments typically have a minimum size of 7 
to 17 amino acids. Fragments preferably comprise at least 10. more preferably at least 20, and most 
preferably at least 30 consecutive amino acids of a native taxadiene synthase polypeptide. 

Fusion polypeptides. The present invention also provides fusion polypeptides including, for 
example, heterologous fusion polypeptides, i.e,, a taxadiene synthase polypeptide sequence or 
fragment thereof and a heterologous polypeptide sequence, e.g,, a sequence from a different 
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polypeptide. Such heterologous fusion polypeptides thus exhibit biological propenies (such as ligand- 
binding, catalysis, secretion signals, antigenic determinants, etc.) derived from each of the fused 
sequences. Fusion partners include, for example, immunoglobulins, betagalactosidase, irpE, protein 
A, beta lactamase, alpha amylase, alcohol dehydrogenase, yeast alpha mating factor, and various 
5 signal and leader sequences which, e.g., can direct the secretion of the polypeptide. Fusion 
polypeptides are typically made by the expression of recombinant nucleic acids or by chemical 
synthesis. 

Polypeptide sequence determination . The sequence of a polypeptide of the present invention 
can be determined by various methods known in the art. In order to determine the sequence of a 
10 polypeptide, the polypeptide is typically fragmented, the fragments separated, and the sequence of 
each fragment determined. To obtain fragments of a laxadiene synthase polypeptide, the polypeptide 
can be digested with an enzyme such as trypsin, clostripain, or Staphylococcus protease, or with 
chemical agents such as cyanogen bromide. 

o-iodosobenzoate, hydroxylamine or 2-nicrO'5-thiocyanobenzoate. Peptide fragments can be 
15 separated, e.g., by reversed -phase high-performance liquid chromatography (HPLC) and analyzed 
by gas-phase sequencing. 

Antibodies 

The present invention also encompasses polyclonal and/or monoclonal antibodies that are 
20 specific for taxadiene synthase, i.e, , bind to taxadiene synthase and are capable of distinguishing the 

taxadiene synthase polypeptide from other polypeptides under standard conditions. Such antibodies 

are produced and assayed by conventional methods. 

For the preparation and use of antibodies according to the present invention, including 

various immunoassay techniques and applications, see, e.g.. Coding, Monoclonal Antibodies: 
25 Principles and Practice, 2d ed. Academic Press, New York, 1986; and Harlow and Lane, Antibodies: 

A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor. NY, 1988. Taxadiene 

synthase-specific antibodies are useful, for example in: purifying taxadiene synthase polypeptides; 

cloning taxadiene synthase homologs from Pacific yew or other plant species from an expression 

library; antibody probes for protein blots and immunoassays; etc. 
30 Taxadiene synthase polypeptides and antibodies can be labeled by conventional techniques. 

Suitable labels include radionuclides, enzymes, substrates, cofactors, inhibitors, fluorescent agents, 

chemiluminescem agents, magnetic particles, etc. 

Plant transformation and regeneration . Any well-known method can be employed for plant 

cell transformation, culture, and regeneration can be employed in the practice of the present 
35 invention. Methods for introduction of foreign DNA into plant cells include, but are not limited to: 

transfer involving the use of Agrobacterium tumefaciens and appropriate Ti vectors, including binary 

vectors; chemically induced transfer {e.g. , with polyethylene glycol); biolistics; and microinjection. 

See, e.g.. An et al.. Plant Molecular Biology Manual A3: 1-19, 1988, 
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The invention will be better understood by reference to the following Examples, which are 
intended to merely illustrate the best mode now known for practicing the invention. The scope of 
the invention is not to be considered limited thereto, however. 

5 EXAMPLE 1: Cloning and Seque ncing of a cDNA encodine Taxa-4(5V i l(12Vdiene Svnthase 

Materials and Methods 
Plants, Substrates, and Standards, Four-year-old T. brevifolia saplings in active growth 
were maintained in a greenhouse. (l-^H]Geranylgeranyl diphosphate (120 Ci/mol) was prepared as 
described previously (LaFever et aL. Arch. Biochem. Biophys. 313:139-149. 1994). and authentic 
10 (±)-taxa-4(5),l I(12)-diene was prepared by total synthesis (Rubensiein. J. Org. Chem, 60:7215- 
7223. 1995). 

Library Construction. Total RNA was extracted from T. brevifolia stem using the 
procedures of Lewinsohn and associates fLewinsohn et aL. Plant MoL Biol. Rep. 12:20-25, 1991) 
developed for woody gymnosperm tissue, Poly(A)* RNA was purified by chromatography on 
15 oligo(dT)-cellulose (Pharmacia) and 

5 ^g of the resulting mRNA-was utilized to construct a XZAP II cDNA library according to the 
manufacturer's instructions (Siraiagene). 

PCR'Based Probe Generation and Library Screening. Comparison of six available sequences 
for monoterpene. sesquiterpene, and diterpene cyclases from higher plants (Facchini and Chappell. 
20 Proc. Natl. Acad. Sci. USA 89: 1 1088-1 1092. 1992; Colby et aL.J. Biol. Chem. 268:23016-23024. 
1993; Mau and West, Proc. Natl. Acad. Sci. USA 91:8497-8501. 1994; Back and Chappell, J. Biol. 
Chem. 270:7375-7381. 1995; Sun and Karmiya. Plant Cell 6:1509-1518. 1994; and Bensen et aL, 
Plant Cell 7:75-84, 1995) allowed definition of eleven homologous regions for which consensus 
degenerate primers were synthesized. All twenty primers (the most carboxy terminal primer, the 
25 most amino terminal primer, and nine internal primers in both directions) were deployed in ail 
possible combinations with a broad range of amplification conditions using CsCl-purified T. brevifolia 
stem library phage DNA as template (Innis and Gelfand, in PCR Protocols (Innis et al., eds), pp. 
3-12. 253-258, Academic Press, San Diego, CA, 1990; Sambrook et al., 1989). 

Analysis of PCR products by gel electrophoresis (Sambrook et al. , 1989) indicated that only 
\0 the combination of primers CC7,2F and CC3R (see FIG. 2) generated a specific DNA fragment 
(-80 bp). This DNA fragment was cloned into pT7Blue (Novagen) and sequenced (DyeDeoxy 
Terminator Cycle Sequencing, Applied Biosystems), and shown to be 83 bp in length. PCR was used 
to prepare approximately 1 /ig of this material for random hexamer labeling with [a-"P]dATP (Tabor 
et aL . in Current Protocols in Molecular Biology, Ausubel et aL . Sections 3.5.9-3.5, 10, 1987) and 
15 use as a hybridization probe to screen filter lifts of 3 x plaques grown in £. coli LE392 using 
standard protocols (Britten and Davidson, in Nucleic Acid Hybridisation, Hames and Higgins, eds., 
pp. 3-14, IRL Press, Oxford, 1988). 

Of the plaques affording positive signals (102 total), 50 were purified through two additional 
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cycles of hybridization. Thiny-eight pure clones were in vivo excised as Blucscripi phagemids. The 
insert size was determined by PGR using T3 and T7 promoter primers, and the twelve largest clones 
(insen > 2 kb) were partially sequenced. 

cDNA Expression in E. coli. All of the panially sequenced, full-length inserts were either 
out of frame or bore premature stop sites immediately upstream of the presumptive methionine start 
codon. The latter complication likely resulted from hairpin-primed second-strand cDNA synthesis 
(Old and Primrose. Principles of Gene Manipulation. 4th ed.. pp, 3435, Blackwell Scientific, 
London. 1989). The 2.7-kb insert from pTb42 was cloned into frame by PGR using the 
thermostable, high fidelity, blunting polymerase Pful (Stratagene) and the FRM42 primer 
(downstream of false stop codons) and T7 promoter primer. The resulting blunt fragment was ligated 
into EcoRV-digested pBluescript SK(-) (Stratagene), yielding pTb42.1. and transformed into £. coli 
XL 1 -Blue (Stratagene) . 

To evaluate functional expression of lerpene cyclase activity. E. coli XLl-Blue cells 
harboring pTb42.1 were grown (to = 0.4) on 5 ml LB medium supplemented with 100 ^g/ml 
ampicillin and 12.5 fig/ml tetracycline before induction with 200 fiM IPTG and subsequent growth 
for4 h at 25**G. Bacteria were harvested by centriftigation (1800g. 10 min), resuspended in taxadiene 
synthase assay buffer (Hezari ei al. , Arch, Biochem. Biophys. 322:437-444. 1995), disrupted by brief 
sonicaiion at 0-4<»G, and the resulting suspension centrifuged {18,000g, 10 min) to pellet debris. The 
supernatant was assayed for taxadiene synthase activity by an established protocol (Hezari et aL, 
Arch. Biochem. Biophys. 322:437-444. 1995) in the presence of 15 [ 1 -^H]geranylgeranyl 
diphosphate and I mM MgGlz. with incubation at 31°C for4 h. The reaction products were extracted 
with pentane and the extract purified by column chromatography on silica gel as previously described 
(Hezari ei aL^Arch. Biochem. Biophys. 322:437-444. 1995) to afford the olefin fraction, an aliquot 
of which was counted by liquid scintillation spectrometry to determine ^H incorporation. Control 
experiments with transformed E. coli bearing the plasmid with out of-frame inserts were also carried 
out. 

The identity of the olefin product of the recombinant enzyme was verified by capillary radio- 
gas chromatography ("capillary radio-GG") (Groieau and Saiierwhiie, J. Chromatogr. 500:349-354, 
1990) as well as capillary gas chromatography -mass spectrum/ spectrometry ("capillary GG-MS") 
using methods described previously (Koepp et aL, J, Biol. Chem, 270:8686-8690, 1995) and 
authentic taxa-4(5),ll(12)-diene (Rubenstein, 7. Org, Chem. 60:7215-7223, 1995). For GG-MS 
analysis (Hewlett-Packard 6890 GG-MSD). selected diagnostic ions were monitored: miz 272 [P^J; 
257 [P^-15(GH3)I; 229 rP"-43(G3H7)J; 121. 122. 123 [C-ring fragment clusterj; and 107 \mlz 122 
base peak - 15(CH3)]. The origin of the highly characteristic G-ring double cleavage fragment ion 
[base peak. mIz 122(G9H,4)] has been described (Koepp ^r a/. . /. Biol. Chem. 270:8686-8690. 1995). 
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cDNA Isolation and Characterization. In general characteristics (molecular weight, divalent 
metal ion requirement, kinetic constants, etc.), taxadiene synthase resembles other terpenoid cyclases 
from higher plants; however, the low tissue titers of the enzyme and its instability under a broad 
range of fractionating conditions impeded purification of the protein to homogeneity (Hezari et al. , 
Arch. Biochem, Biophys. 322:437-444, 1995). A 10 pig sample of the electrophoreiically-purified 
cyclase, prepared by standard analytical procedures (Schagger and von Jagow, Anal. Biochem. 
166:368-379, 1987; Towbin et al. , Proc. Natl. Acad. Sci. USA 76:350-4354, 1979). failed to provide 
amino-ierminal sequence via Edman degradation. Repeated attempts at trypsinization and CNBr 
cleavage of comparable protein samples also failed to provide sequenceable peptides, in large part 
because of very low recoveries. 

As an alternate approach to cDNA library screening using protein-based oligonucleotide 
probes, a PCR-based strategy was developed that was founded on a set of degenerate primers for 
PCR amplification designed to recognize highly-conserved regions of six higher plant terpene cyclases 
whose nucleotide sequences are known. Three of these cyclases, (-)-limonene synthase (a 
monoterpene cyclase from spearmint) (Colby et al., 7. Biol. Chem. 268:23016-23024. 1993). epi- 
arisiolochene synthase (a sesquiterpene cyclase from tobacco) (Facchini and Chappell. Proc. Natl. 
Acad. Set. 89:11088-11092, 1992; Back and Chappell, 7. Biol. Chem. 270:7375-7381. 1995), 
and casbene synthase (a diterpene cyclase from castor bean) (Mau and West, Proc. Natl. Acad. Sci. 
USA 91:8497-8501, 1994), exploit reaction mechanisms similar to taxadiene synthase in the 
cyclization of the respective geranyl (C,o), famesyl (C,5), and geranylgeranyl (Cjo) diphosphate 
substrates (Lin et al.. Biochemistry, in press). Kaurene synthase A from Arabidopsis thaliana (Sun 
and Karmiya, Plant Cell 6:1509-1518, 1994) and maize (Bensen et al.^ Plant Cell 7:75-84, 1995) 
and (-)-abietadiene synthase from grand fir {Abies grandis; Stofer Vogel, Wildung, Vogel. and 
Croteau, manuscript in preparation) exploit a quite different mechanism that involves proionation of 
the terminal double bond of geranylgeranyl diphosphate to initiate cyclization to the intermediate 
copalyl diphosphate followed, in the case of abietadiene synthase, by the more typical ionization of 
the diphosphate ester function to initiate a second cyclization sequence to the product olefin (LaFever 
et al.. Arch. Biochem. Biophys. 313:139-149, 1994). The latter represents the only gymnosperm 
terpene cyclase sequence presently available. 

Comparison of deduced amino acid sequences between all of the cyclases targeted eleven 
regions for PCR primer construction. Testing of all twenty primers in all combinations under a 
broad range of amplification conditions, followed by product analysis by gel electrophoresis, revealed 
that only one combination of primers (CC7.2 (forward) with CC3 (reverse), see FIG. 2 for locations) 
yielded a specific DNA fragment (83 bp) using T. brevifolia libraiy phage as template. Primer CC3 
delineates a region of strong homology between (-)-limonene synthase (Colby et aL, J. BioL Chem. 
268:23016-23024, 1993), e/?/-aristolochene synthase (Facchini and Chappell, Proc. Natl. Acad. Sci. 
USA 89:11088-11092. 1992) and casbene synthase (Mau and West, Proc. Natl. Acad. Sci. USA 



wo 97/38571 

PCT/US97/06320 

-18- 



10 



15 



20 



25 



30 



91 :8497-8501 . 1994). Primer CC7.2 was selected based on sequence comparison of the ang.osperm 
diterpene cyclases (Mau and West. Proc. Natl. Acad. Sci. USA 91:8497-8501. 1994, Sun and 
Karmiya. Plant Cell 6:1509-1518. 1994; Bensen et al.. Plant Cell 7:75-84. 1995) to the recently 
acquired cDNA clone encoding a gymnosperm diterpene cyclase, (-)-abietadiene synthase from grand 
fir (Stofer Vogel. Wlldung. Vogel. and Croteau. manuscript in preparation). 

The 83 bp fragment was cloned and sequenced, and thus demonstrated to be cyclase-Iike. 
This PCR product was "P-labeled for use as a hybridization probe and employed in high stringency 
screening of 3 x 10^ plaques which yielded 102 positive signals. Fifty of these clones were purified 
through two additional rounds of screening, in vivo excised and the inserts sized. The twelve clones 
bearing the largest mserts (> 2.0 kb) were partially sequenced, indicating that they were all 
representations of the same gene. Four of these msens appeared to be full-length. 

cDNA Expression in E. coli. All four of the full-length clones that were purified were out 
of frame or had stop sites immediately upstream of the staning methionine codon resulting from 
hairpin-primed second strand cDNA synthesis. The insen from pTb42 was cloned into frame by 
PCR methods, the blum fragment was ligated into the EcoRV-site of pBluescript SK(-). yielding 
pTb42.1. and transformed into E. coli XLl-Blue. 

Transformed E. coli were grown in LB medium supplemented with antibiotics and induced 
with IPTG. The cells were harvested and homogenized, and the extracts were assayed for taxadicne 
synthase activity using standard protocols with H-'Hlgeranylgeranyl diphosphate as substrate (Hezari 
et al.. Arch. Biochem. Biophys. 322:437-444. 1995). The olefin fraction isolated from the reaction 
mixture contained a radioactive product (-1 nmol) that was coincidem on capillary radio-GC with 
authentic taxa-4(5), 1 l(12)-diene (Rt = 19.40 ± 0.13 min). 

The identification of this diterpene olefin was confirmed by capillary GC-MS analysis. The 
retention time (12.73 min. vs. 12.72 min.) and selected ion mass spectrum (Table I) of the diterpene 
olefin product was identical to that of authemic (±)-taxa-4(5).l l(12)-diene (Rubensiein. J. Org. 
Chem. 60:7215-7223. 1995). The origm of the selected diagnostic ions shown in Table I, which 
account for most of fiiU spectrum abundance, are described herein and elsewhere (Koepp et al., J. 
Biol. Chem. 270:8686-8690. 1995). Because of differem sample sizes, the total abundance of the 
authentic standard (2.96 E') was approximately twice that of the biosynthetic olefin (1 .42 E'). This . 
and variation in background between runs, probably account for minor differences in relative 
abundances of the high mass fragments. 



wo 97/38571 




PCT/US97/06320 



Table 1 : GC-MS Analysis of the Diterpene Qiefm Synthesized by Recombinant Taxadiene Synthase 
( Product") Compared to Authentic Taxa-4(5>. 1 H12)-diene TStandard") 

5 

Relative Abundance (%) 





m/z 


Product 


Standard 


10 


107 


15.3 


15.3 




121 


14.3 


14.3 




122 


58.1 


57.8 




123 


10.2 


10.3 




229 


0.56 


0.71 


15 


257 


0.35 


0.45 




272 


1,19 


1.17 



Since identically prepared extracts of control cultures of E. coli that were transformed with 
pBluescript bearing an out-of-frame insert were incapable of transforming geranylgeranyl diphosphate 
to detectable levels of diterpene olefin, these results confirm that clone pTb42. 1 encodes the taxadiene 
synthase from Pacific yew. 

Sequence Analysis. Both strands of the insens from pTb42 and pTb42. 1 were sequenced. 
No mistakes were incorporated by Pfu polymerase. The pTb42. 1 taxadiene synthase cDNA is 2700 
nucleotides in length and contains a complete open reading frame of 2586 nucleotides (FIG. 2). The 
deduced amino acid sequence indicates the presence of a putative plastidiaJ transit peptide of 
approximately 137 amino acids and a mature protein of about 725 residues (-82.5 kDa), based on 
the size of the native (mature) enzyme ( -79 kDa) as estimated by gel permeation chromatography 
and sodium dodecyl sulfate- polyacryiamide gel electrophoresis ("SDS-PAGE") (Hezari ei al, , Arch, 
Btochem. Biophys, 322:437-444, 1995), the characteristic amino acid content and structural features 
of such aminoierminal targeting sequences, and their cleavage sites (Keegsira e! al. , Annu. Rev. Plant 
Physiol. Plant Mol. Biol. 40:471-501, 1989; von Heijne era/. . £wr. J. Biochem. 180:535-545, 1989), 
and the fact that diterpene biosynthesis is localized exclusively within plastids (West et al. , Rec. Adv. 
Phytochem. 13:163-198, 1979: Kleinig, /4/i/u/. /?^v. Plant Physiol. Plant Mol. Biol. 40:39-59, 1989). 
The transit peptide/mature protein junction and thus the exact lengths of both moieties are unknown, 
because the amino terminus of the mature protein is apparently blocked and has not yet been 
identified. 

Pairwise sequence comparison (Feng and Doolittle, Methods Enzvmol. 183:375-387, 1990; 
Genetics Computer Group, Program Manual for the Wisconsin Packet, Version 8, Genetics Computer 
Group, Madison, WI. 1994) with other terpene cyclases from higher plants revealed a significant 
degree of sequence similarity at the amino acid level. The taxadiene synthase from yew showed 32% 
identity and 55% similarity to (-)-limonene synthase from spearmint (Colby et aL, J. Biol. Chem. 
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268:23016-23024. 1993).30% identity and 54% similarity to «/»»-aristolochene synthase from tobacco 
fFacchini and Chappell. Proc. Natl. Acad. Sci. USA 89:1 1088-11092. 1992). 31 % identity and 56% 
similarity to casbene synthase from castor bean (Mau and West. Proc. Nail. Acad. Sci. USA 91 :8497- 
8501 . 1994), and 33% identity and 56% similarity to kaurene synthase A from Arabidopsis thaliana 
5 and maize (Sun and Karmiya. Plant Cell 6:1509-1518. 1994; Bensen et al.. Plant Cell 7:75-84. 
1995). and 45 % identity and 67% similarity to (-)-abietadiene synthase from grand fir (Stofer Vogel. 
Wildung. Vogel. and Croteau. manuscript in preparation). Pairwise comparison of other members 
within this group show roughly comparable levels of identity (30-40%) and similarity (50-60%). 
These terpenoid synthases represent a broad range of cyclase types from diverse plant families, 
supporting the suggestion of a common ancestry for this class of enzymes (Colby et al.. J. Biol. 
Chem. 268:23016-23024. 1993; Mau and West. Proc. Natl. Acad. Sci. USA 91:8497-8501. 1994; 
Back and Chappell. y. Biol. Chem. 270:7375-7381. 1995; McGarvey and Croteau. Plant Ce//7:1015- 
1026. 1995; Chappell. /4n/itt. Rev. Plant Physiol. Plant Mol. Biol. 46:521-547. 1995). 

The amino acid sequence of taxadiene synthase does not closely resemble (identity - 20%; 
similarity -40%) that of any of the microbial sesquiterpene cyclases that have been determined 
recently (Hohn and Beremand. Gene (Amt.) 79:131-136. 1989; Proctor and Hohn. J. Biol. Chem. 
268:4543-4548. 1993; Cane et al.. Biochemistry 33:5846-5857. 1994). nor does the taxadiene 
synthase sequence resemble any of the published sequences for prenyltransferases (Chen et al.. 
Protein Sci. 3:600-607. 1994; Scolnikand Baitley. Plant Physiol. 104:1469-1470. 1994; Attucci et 
al. Arch. Biochem. Biophys. 321:493-500. 1995). a group of enzymes that, like the teipenoid 
cyclases, employ ally lie diphosphate substrates and exploit similar electrophilic reaction mechanisms 
(Poulter and Rilling, in Biosynthesis of Isoprenoid Compounds, Porter and Spurgeon. eds.. vol. 1. 
pp. 161-224. Wiley & Sons. New York, NY, 1981). The aspanate-rich (I.L.V)XDDXX(XX)D 
motif{s) found in most prenyltransferases and terpenoid cyclases (Facchini and Chappell, Proc. Natl. 
Acad. Sci. USA 89:11088-11092. 1992; Colby et al.. J. Biol. Chem. 268:23016-23024. 1993; Mau 
and West. Proc. Natl. Acad. Sci. USA 91:8497-8501. 1994; Back and Chappell. J. Biol. Chem. 
270:7375-7381. 1995; Hohn and Beremand. Gene (Amst.) 79:131-136. 1989; Proctor and Hohn, J. 
Biol. Chem. 268:4543-4548. 1993; Cane et al.. Biochemistry 33:5846-5857. 1994; Chen et al.. 
Protein Sci. 3:600-607. 1994; Scolnik and Bartley. Plant Physiol. 104:1469-1470, 1994; Attucci et 
al. Arch. Biochem. Biophys. 321:493-500, 1995; Abe and Prestwich. J. Biol. Chem. 269:802-804, 
1994). and thought to play a role in substrate binding (Chen et al.. Protein Sci. 3:600-607. 1994; 
Abe and Prestwich. /. Biol. Chem. 269:802-804. 1994; Manero et al. , J. Biol. Chem. 267 2 1873- 
21878. 1992; Joly and Edwards. J. Biol. Chem. 268:26983-26989. 1993; Tarshis et al.. Biochemistry 
33:10871-10877. 1994). is also present in taxadiene synthase, as is a related DXXDD motif (FIG. 
35 2). Histidine and cysteine residues have been implicated at the active sites of several terpenoid 
cyclases of plant origin (Rajaonarivony et aL.Arch. Biochem. Biophys. 299:77-82, 1992; Savage et 
aL.Arch. Biochem. Biophys. 320:257-265. 1995). A search of the aligned sequences revealed that 
three histidines (at positions 370. 415 and 793) and three cysteines (at positions 329. 650 and 777) 
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of laxadiene synthase are conserved among the plant terpenoid cyclase genes. The taxadiene synthase 
from yew most closely resembles the abietadiene synthase from grand fir rather than the casbene 
synthase from castor bean (Mau and West, Proc. Natl. Acad. Sci. USA 91:8497-8501, 1994), which 
catalyzes a similar type of cyclization reaction but is phylogeneiically quite distant. The abietadiene 
5 synthase from grand fir is the only other terpenoid cyclase sequence from a gymnosperm now 
available (Siofer VogeU Wildung, Vogel, and Croieau» in preparation), and these two diterpene 
cyclases from the coniferales share several regions of significant sequence homology, one of which 
was fonuiiousiy chosen for primer construction and proved to be instrumeniaJ in the acquisition of 
a PCR-derived probe that led to the cloning of taxadiene synthase. 

10 

EXAMPLE 3: Expres sion of Taxadiene Synthase Genes Truncated to Remove Transit Peptide 
Sequences 

The native taxadiene synthase gene sequence was truncated from the 5' end to removing pan 
or all of the sequence that encodes die plastidial transit peptide of approximately 137 amino acids (the 
15 mature taxadiene synthase polypeptide is about 725 amino acids.) Deletion mutants were produced 
that remove amino acid residues from the amino terminus up to residue 31 (Glu). 39 (Ser). 49 (Ser), 
54 (Gly), 79 (Val), or 82 (He). These mutants were expressed in E. coli calls and cell extracts were 
assayed for taxadiene synthase activity as described above. In preliminary experiments, expression 
of truncation mutants up was increased over wild-type taxadiene synthase by up to about 50%. with 
20 further truncation past residues 83-84 apparently decreasing taxadiene synthase activity. 

Truncation of at least pan of the plastidial transit peptide improves taxadiene synthase 
expression. Moreover, removing this sequence improves purification of taxadiene synthase, since 
the transit peptide is recognized by £. coli chaperonins, which co-purify with the enzyme and 
complicate purification, and because the taxadiene synthase preproiein tends to form inclusion bodies 
25 when expressed in E, coli. 

The actual cleavage site for removal of the transit peptide may not be at the predicted 
cleavage site between residue 136 (Ser) and residue 137 (Pro). A transit peptide of 136 residues 
appears quite long, and other (monoterpene) synthases have a tandem pair of arginines (Arg-Arg) at 
about residue 60 (Met). Truncation immediately amino-terminal to the tandem pair of arginines of 
30 these synthases has resulted in excellent expression in E. coli, Taxadiene synthase lacks an Arg-Arg 
element. Also, truncation beyond residues 83-84 leads to lower activity. 

This invention has been detailed both by example and by direct description. It should be 
apparent that one having ordinary skill in the relevant an would be able to surmise equivalents to the 
invention as described in the claims which follow but which would be within the spirit of the 
35 foregoing description. Those equivalents are to be included within the scope of this invention. 
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WHAT IS CLAIMED IS: 

1. An isolated polynucleotide comprising at least 15 consecutive nucleotides of a native 
taxadiene synthase gene. 

2. An isolated polynucleotide comprising at least 30 consecutive nucleotides of a native 
5 taxadiene synthase gene. 

3. The polynucleotide of claim 1 comprising a sequence that encodes a polypeptide having 
taxadiene synthase biological activity. 

4. A cell comprising ihe polynucleotide of claim 1. 

5. A plant cell comprising the polynucleotide of claim 1. 

6. A transgenic plant comprising the polynucleotide of claim 1. 

7. An isolated polynucleotide comprising a polypepiide-encoding sequence that encodes a 
polypeptide with taxadiene synthase biological activity, wherein the polynucleotide-encoding sequence 
has at least 70% nucleotide sequence similarity with a native Pacific yew taxadiene synthase gene. 

8. The polynucleotide of claim 7 wherein the polypeptide-encoding sequence has at least 
15 80% nucleotide sequence similarity with the native Pacific yew taxadiene synthase gene. 

9. The polynucleotide of claim 8 wherein the polypeptide-encoding sequence has at least 
90% nucleotide sequence similarity with the native Pacific yew taxadiene synthase gene. 

10. The polynucleotide of claim 7 wherein the polypeptide-encoding sequence encodes a 
polypeptide having only conservative amino acid substitutions to the taxadiene synthase polypeptide 

20 sequence of FIG. 2 except for an amino acid substitution at at least one location selected from the 
group consisting of: cysteine residues 329, 650, 719. and 777; histidine residues 370, 415, 579, and 
793; a DDXXD motif; a DXXDD motif; a conserved arginine; and a RWWK element. 

1 1 The polynucleotide of claim 7 wherein the polypeptide-encoding sequence encodes a 
polypeptide having only conservative amino acid substitutions to a native Pacific yew taxadiene 

25 synthase polypeptide sequence. 

12. The polynucleotide of claim 7 wherein the polypeptide-encoding sequence encodes a 
polypeptide that lacks at least pan of a transit peptide sequence of a native Pacific yew taxadiene 
synthase polypeptide. 

13. The polynucleotide of claim 7 wherein the polypeptide-encoding sequence encodes a 
30 polypeptide that is completely homologous with a native taxadiene synthase polypeptide. 

14. A cell comprising the polynucleotide of claim 7. 

15. A plant cell comprising the polynucleotide of claim 7. 

16. A transgenic plant comprising the polynucleotide of claim 7. 

17. An isolated polypeptide having taxadiene synthase activity. 

18. The polypeptide of claim 17 having at least 70% amino acid sequence identity with a 
native Pacific yew taxadiene synthase polypeptide. 

19. The polypeptide of claim 18 having at least 80% amino acid sequence identity with the 
taxadiene synthase polypeptide. 
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20. The polypeptide of claim 19 having at least 90% amino acid sequence identity with the 
taxadiene synthase polypeptide. 

21 . The polypeptide of claim 17 having only conservative substitutions to the native Pacific 
yew taxadiene synthase polypeptide sequence except for an amino acid substitution at one or more 

5 locations of the native Pacific yew taxadiene synthase polypeptide sequence that are selected from 
the group consisting of: cysteine residues 329, 650, 719, and 777; hisiidine residues 370. 415, 579. 
and 793; a DDXXD motif; a DXXDD motif; a conserved arginme; and a RWWK element. 

22. The polypeptide of claim 17 having only conservative amino acid substitutions to the 
native Pacific yew taxadiene synthase polypeptide sequence. 

iO 23. The polypeptide of claim 17 thai is completely homologous with the native Pacific yew 

taxadiene synthase polypeptide sequence. 

24. The polypeptide of claim 17 lacking pan or all of a transit peptide. 

25. An isolated polypeptide comprising at least 10 consecutive amino acids of a native 
Pacific yew taxadiene synthase polypeptide. 

15 26. An isolated mature native Pacific yew taxadiene synthase polypeptide. 

27. An antibody specific for a native Pacific yew taxadiene synthase polypeptide. 

28 A method of expressing a taxadiene synthase polypeptide in a cell, the method 
comprising the steps of: 

providing a cell that comprises an expressible polynucleotide that encodes a taxadiene 
20 synthase polypeptide according to claim 17; and 

culturing the cell under conditions suitable for expression of the polypeptide. 

29. The method of claim 28 wherein the cell is a laxoid-producing cell. 

30. The method of claim 29 wherein expression of the polynucleotide causes the cell to 
produce a higher level of a taxoid than an otherwise similar cell that lacks the expressible 

25 polynucleotide. 

31. A method of obtaining a taxadiene synthase gene comprising the steps of: 
contacting a nucleic acid of a taxoid -producing organism with a probe or primer comprising 

a polynucleotide of claim I under stringent hybridization conditions, thereby causing the probe or 
primer to hybridize to a taxadiene synthase gene of the organism; 
30 and isolating the taxadiene synthase gene of the organism. 
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1 TTCCCCTCCCTCTCT3G«iaJiAIS3C^^ 90 

* M AQL.SrNAAUKMNALCNKA.IH0P 23 

91 ACCaATTSCACACCCAAATCTayCGCCAAATa ArC. lUUU rint^lVLJ WaATC:AO0CXX:JU^:CiUSiCTJ^^ j^gQ 

24 ^ _N_C^ A K SS^RQMMWVC S'^R SCRTRVKMSTrgSC 53 

131 :J.^^T^K.-^■^ t AAT^TGACCAOCAGCACKgCACTAOCAA t^lWrr 270 

54 CTPG? VVMMSSSTGTSKVVSETSSTIvToorTp 33 

271 CC»CrCTCCCX:CAATTATCATCCCGATCTCnX3G^ 350 

34 R L 3 A .V Y H G 0 LWHHNVIQTL£T?rRSSSTYQ lU 

3ol GAACCGGCACATGACO ICO l'lUlt^AAAArrAAAGATATCTTC:AA l XiLWIT, C aAGAOC^^ 45q 

114 E R A O S ^ V V ^ rXOMFNALCOGOIsTpSAYOTA 143 

tCGcrrGGcaACGCTGGcaACCAi':i*>.t.iv:x^^ ri ' iv^ iCA OGCccrcAA L iiicxr ri ' : - :t.A ACAAcai>ccTc 540 

144 W V A R L A T I SSDGSE KPRFPQALNWVTNNQL 173 
541 CACGATCGATCGTGGGGTATCGAATCGCACrTTACT^ 

tgcttaacacgaccaattctuttatccccctctcggtmsc 630 

-3? AAaLj^CGGCACAK:- ^SSHFSLCORLLNTTNSVIALSVW 203 

!L-^ ^ ^ Q ^ Q 0 GASF lASNLRLLNSiDSLSPO* 233 

721 TTCCAAATAAil, I'l-'iVwlUCTCTOLlX^CAAA^CCKSUAACCICnri^^ ril#lU# 810 

234 rQr:r?ALL Q K AXALGINLPYDLPrtKYLS 263 

ACAACACSGGAACCC;^3GCTTACaC AiU I ' i' lVl - gO QGCaCCAGACAATATrCCAGCCAMIAT^^ 900 

264 TTREARLTOVSAAADMXPANMLMALEGLEE 293 

901 GTTATTCACTCGAACAAGATTATGaOjriT^^ 990 

294 V : 0 W N K I M R r OSXacSFLS. 3PASTACVLMM 323 

^ ^^^^^^g^i^^SACSAAAAAU^ i'l'i'LAi, 1 i'l'lV. LK.XfiC:< X r ZTZC:i r L\J <CyAX^^ 1080 

. ^ ^ ^ ^ ••^, 1-3^ TrLMMLLDRFGGCVPCMYSrOLL 353 

^ ^ gAAC3w^>-,-; wG^. >'^l'i^TAAC\TTCacCAICrr3GAATCGGT^ 1170 

.Jf;^ - '"^ - 3 L '/ O M I E H LGIGRfKlrKQEIXGALDYVY 383 

il71 AGACATTCQaCTGAAAGws^GCATCGCTTCXaG^ riX-^J kCATCTCAACACCAgitfSCCCTtaGC^^ 1260 

384 R H W S S »^ g £ G W C R 0 S L '/ ? D L M T T A L G L R T L R 413 

1 2 s J. ATOcy:GGATACA^rci'ri','i':^..aaAL.>.:'ri'ii^TAATTT^^ 13 so 

414 M [TJ G Y y V S 5 0 V LNNFXDEMGRF-SSAGQTHV 443 

13 51 GAATTCAGAAG^^ . ^ J^'i*>'-"^waGACCTTCCaACCrrGCA a ' :^^ ^ 1440 

^ ^ ^ ''^ ^ ^ ^ r.^ASDLAFPDERAMDOARKrAE 473 

1441 CCAtATCrrAGAGACSCaC'l^tiCAaCaAAAATCTCAACCAA^ t CJ lUa AgrA CU.V. ' nmAJ VC 1530 

474 PYLR^S^ALAT2CrSTNTKLriCE:ZYVVcYPWH S03 

l3jl ATGACTTArC — ^--'CrrjUGAAGCCAaAACTTATATTCAT^ 1620 

,1^^ ^\ ^ ^ f 2 - ° * ^ O D J* ^ '-^ W <3 K M P 333 

1521 . • ^.^ACTAAT'TCAAAAxv^ * - ♦^ ftGAATTOGCVAftArrT:gACTTCA^ ll^Sj CATCAACaCSACrTgAAG C ' riVrA ACA 1710 

^ ^ ^, ^ ^ ^^^AKLOF.vrVQSLHQEZLXLLT 563 

^"■^ ACa .;->^.wAA CGAATC C3GCATGGCA^ ^^..-p^^^^..,^^ j^qqq 

564 Fr W W :<[ E S GHAD II^FT^lHRVAZVYFSSATrEa 593 

1301 GAATAUv.ix^CACrAGAAriVX^Cl-l'CACAAAA AT ' .Os^i - lt^. - l - r AC ^ 1990 

594 £ Y S A T R rAFTKrGCLQVLrDDMAOrFATLO 623 

1391 GAATTOVAAi^CrrrCACTCSACCKSaCTAAAG^ j,980 

^ ^ S-TEGVXRwOTSLLHSIPECMgTElFKV 653 

1981 TCGrrCAAATTAAraUACAACTW^AXAATOA^^ 2070 

,n-^ ^ ^ -"^ , ^ ^EVNNOVVKVQGROKLAHIRXPWEL 683 

20'^ TACrrCAATTCTTATCTACAAGAAACCC»CTCGCTTG^^ 2160 

jL.LJLf UL.^ S^EWLEAGYIPTFSEYLKTYAIS 713 

GTAfc^;, > «\^GAC^;:,r^AACCC7ACAACCAATACTACTAATX:x:^^ i ' l T ^ AGAAAGTCGACTATCCCTCA 2250 

^:*f ^ ^ ^ ^ ^ ^ ^ Q P I L L M G E L V K D 0 V V Z :< V H Y ? S 743 

22 1 AATATCT-TGACCrOTATCrrrcaCCroc^^ 2340 

,;^7 ™1. 1. iL.^ "'SLSWRLTNOTXTYQAEXARCQQAS 773 

Z 3 4 L s.,.^T,.GCATCCTATATCAACGATAATCCAC 2430 

S.J ^ ^ • «KDN?GATZEDArK[H]rCRyVDRALK 803 

GAACv..A;^5wti-ti^TATrTCAAACCATCCLVATC^ 2520 

J?? S F r Y - :< ? S N D : ? M G C :< S F I - C R L C V Q I 933 

- - 2 * '. i-i-. ACAAGTTTATAGATCGGTACSaAATtXgCAATCAGGACATTAAG^^ 2610 

^ ^^'"^ ^ ' ° Q - G tAMSsiKDYERKVYlOPIOV- 862 

2911 TATA — XTCTAAAAw^:^^ i > '^'^TCATAAATTCacrTATrATTCTATTGCCAAAAAAAAAAAAAAAA 2700 
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COMPOSITIONS AND METHODS FOR TAXOL BIOSYNTHESIS 

CROSS REF RRENCE TO RELATED CASF 
This application claims ihc benefit of U.S. Provisional Application No. 60/015,993. filed 
April 15. 1996. incorporated herein by reference. 

TECHNICAI FIFI n 

This invention is related to the field of detection of diterpenoid biosynthesis, panicularly to 
the biosynthesis of taxoid compounds such as Taxol. 
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No. CA-55254. The government has cenain rights in this invention. 

BACKGROUND ART 
The highly functionalized diterpenoid Taxol (Wani et al. , J. Am. Chem. Soc. 93:2325-2327. 
1971) is well-established as a potent chemotherapeutic agent (Holmes et al., in Taxane Anticancer 
Agents: Basic Science and Current Status. Georg et al. . eds. . pp. 31-57. American Chemical Society. 
Washington, DC. 1995; Arbuck and Blaylock, in Taxol: Science and Applications. Suffness. ed.. 
pp. 379-415. CRC Press, Boca Raton. PL. 1995). (Paclitaxel is the generic name for Taxol. a 
registered trademark of Bristol-Myers Squibb.) 

The supply of Taxol from the original source, the bark of the Pacific yew (Taxus brevifolia 
Num.: Taxaceae) is limited. As a result, there have been intensive effons to develop alternate means 
of production, including isolation from the foliage and other renewable tissues of plantation-grown 
Taxus species, biosynthesis in tissue culture systems, and semisynthesis of Taxol and its analogs from 
advanced taxane diterpenoid (taxoid) metabolites that are more readily available (Cragg et al. . J. Nat. 
Prod. 56:1657-1668. 1993). Total synthesis of Taxol. at presem. is not commercially viable 
(Borman. Chem. Eng. News 72(7):32-34. 1994). and it is clear that in the foreseeable future the 
supply of Taxol and its synthetically useful progenitors must rely on biological methods of 
production, either in Taxus plants or in cell cultures derived therefrom (Suffness. in Taxane 
Anticancer Agents: Basic Science and Current Status. Georg et al., eds., American Chemical 
Society. Washington. DC, 1995, pp. 1-17). 

The biosynthesis of Taxol involves the initial cyclization of geranylgeranyl diphosphate, the 
universal precursor of diterpenoids (West, in Biosynthesis of Isoprenoid Compounds. Porter and 
35 Spurgeon, eds.. vol. I. pp. 375-41 1, Wiley & Sons. New York, NY, 1981), to 

taxa-4(5),ll(12)-diene (Koepp et al.. J. Biol. Chem. 270:8686-8690, 1995) followed by extensive 
oxidative modification of this olefin (Koepp et al. , J. Biol. Chem. 270:8686-8690, 1995; Croteau et 
al., in Taxane Anticancer Agents: Basic Science and Current Status. Georg et al., eds.. pp. 72-80. 



25 



30 



OMtg OCID . 'dW0_jm8SnA1JA> 



wo 97/38571 

PCT/US97/06320 

-2- 



10 



American Chemical Society. Washington. DC. 1995) and elaboration of the side chains (FIG. I) 
(Floss and Mocek. in Taxol: Science and Applications . Suffness. ed.. pp. 191-208. CRC Press. Boca 
Raton, FL, 1995). 

Taxa-4(5),ll(l2)-diene synthase ("laxadiene synthase"), the enzyme responsible for the 
mi.ial cycluation of geranylgcranyl diphosphate, to delineate the taxane skeleton, has been isolated 
fron, T. brevifolia stem tissue, partially purified, and characterized (Hczari et al.. Arch. Biochem. 
Biophys. 322:437-444. 1995). 

Although taxadiene symhase resembles other plant terpenoid cyclases in general enzymatic 
properties (Hezari et al.. Arch. Biochem. Biophy.,. 322:437-444, 1995). has proved extremely 
difficult to purify in sufficient amounts for antibody preparation or m.crosequencing. thwarting this 
approach toward cDNA cloning. 
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SUMMARY OF T HE INVENtlON 
We have cloned and sequenced the taxadiene synthase gene of Pacific yew. 
One embodiment of the invention includes isolated polynucleotides comprising at least 15 
consecutive nucleotides, preferably at least 20. more preferably at least 25, and most preferably at 
least 30 consecutive nucleotides of a native taxadiene synthase gene. e.g. . the taxadiene synthase gene 
of Pacific yew. Such polynucleotides are useful, for example, as probes and primers for obtaining 
homologs of the taxadiene synthase gene of Pacific yew by, for example, contacting a nucleic acid 
of a taxoid-producing organism with such a probe or primer under stringent hybridization conditions 
to permit the probe or primer to hybridize to a taxadiene synthase gene of the organism, -then 
isolating the taxadiene synthase gene of the organism to which the probe or primer hybridizes. 

Another embodiment of the invention includes isolated polynucleotides comprising a 
sequence that encodes a polypeptide having taxadiene synthase biological activity. Preferably, the 
polypeptide-encoding sequence has at least 70%, preferably at least 80%, and more preferably at leasi 
90 % nucleotide sequence similarity with a native Pacific yew taxadiene synthase polynucleotide gene. 

In preferred embodiments of such polynucleotides, the polypeptide-encoding sequence 
encodes a polypeptide having only conservative amino acid substitutions to the native Pacific yew 
taxadiene synthase polypeptide, except, in some embodiments, for amino acid substitutions at one or 
more of: cysteine residues 329. 650. 719. and 777; histidine residues 370, 415. 579, and 793; a 
DDXXD motif: a DXXDD motif; a conserved arginine; and a RWWK element. Preferably, the 
encoded polypeptide has only conservative amino acid substitutions to or is completely homologous 
with the native Pacific yew taxadiene synthase polypeptide. In addition, the encoded polypeptide 
preferably lacks at least pan of the transit peptide. Also included are cells, particularly plant cells, 
and transgenic plants that include such polynucleotides and the encoded polypeptides. 

Another embodiment of the invention includes isolated polypeptides having taxadiene 
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synthase activity, preferably having ai least 70%. more preferably at least 80%, and most preferably 
at least 90% homology with a native laxadiene synthase polypeptide, Also included are isolated 
polypeptides thai comprise at least 10, preferably at least 20. more preferably at least 30 consecutive 
amino acids of a native Pacific yew taxadiene synthase, and most preferaibly the mature Pacific yew 
5 taxadiene synthase polypeptide (i.e., lacking only the transit peptide). 

Another embodiment of the invention includes antibodies specific for a native Pacific yew 
taxadiene synthase polypeptide. 

Another embodiment of the invention includes methods of expressing a taxadiene synthase 
polypeptide in a cell, e.g., a laxo id-producing cell, by culturing a cell that includes an expressible 
10 polynucleotide encoding a taxadiene synthase polypeptide under conditions suitable for expression of 
the polypeptide, preferably resulting in the production of the taxoid at levels that are higher than 
would be expected from an otherwise similar cell that lacks the expressible polynucleotide. 

The foregoing and other objects and advantages of the invention will become more apparent 
from the following detailed description and accompanying drawings. 

15 

BRIEF DESCRIPTION OF THE DRAWINGS 
FIG. 1 shows steps in the biosynthesis of Taxol. including the initial cyclization of 
gcranylgeranyl diphosphate to taxa-4f5). ll(12)-diene, followed by extensive oxidative modification 
and elaboration of the side chains. 
20 FIG. 2 shows the nucleotide and predicted amino acid sequence of Pacific yew taxadiene 

synthase clone pTb 42.1. The start and stop codons are underlined. The locations of regions 
employed for primer synthesis are double underlined. The DDMAD and DSYDD motifs are in 
boldface. Conserved histidines (H) and cysteines (C) and an RWWK element are indicated by boxes. 
Truncation sites for removal of part or all of the transit peptide are indicated by a triangle (▼). 

25 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
A homology -based cloning strategy using the polymerase chain reaction (PCR) was employed 
to isolate a cDNA encoding taxadiene synthase. A set of degenerate primers was constructed based 
on consensus sequences of related monoterpene, sesquiterpene, and diterpene cyclases. Two of these 
30 primers amplified a 83 base pair (bp) fragment that was cyclase-like in sequence and that was 
employed as a hybridization probe to screen a cDNA library constructed from poly(A)* RNA 
extracted from Pacific yew stems. Twelve independent clones with insen size in excess of two 
kilobase pairs (kb) were isolated and partially sequenced. 

One of these cDNA isolates was functionally expressed in Escherichia coli, yielding a 
35 protein that was catalyiically active in convening geranylgeranyl diphosphate to a diterpene olefin that 
was confirmed lo be iaxa-4(5). 1 l(12)-diene by combined capillary gas chromatography -mass 
spectrometry (Satterwhite and Croteau, J. Chromatography 452:61-73. 1988). 

The taxa-4(5). 1 1 ( 1 2)-diene synthase cDN A sequence specifies an open reading frame of 2586 



O MOD O aP: <MD_m8671A1JA> 



wo 97/38571 _ ^ „^„,c.„, 

PCT/US97/06320 

-4- 



10 



15 



nucleocides. The deduced polypeptide sequence contains 862 amino acid residues and has a molecular 
weight of 98.303. compared to about 79.000 previously determined for the mature native enzyme. 
It therefore appears to be full-length and includes a long presumptive plastidial targeting peptide. 
Sequence comparisons with monoterpene, sesquiterpene, and diterpene cyclases of plant origin 
indicate a significant degree of similarity between these enzymes; the taxadiene synthase most closely 
resembles (46% idemity. 67% similarity, abietadicne synthase, a diterpene cyclase from grand fir. 

Uses of the Taxadiene Synthase Gene 

Increasing Taxol Biosynthesis in Transformed r.i.c xk. committed step of Taxol 
(paclitaxel) biosynthesis is the initial cyclization of geranylgeranyl diphosphate, a ubiquitous 
isoprenoid intermediate, catalyzed by taxadiene synthase, a diterpene cyclase. The product of this 
reaction is the parem olefin with a taxane skeleton. taxa-4(5). 1 l(i2)-diene. For a review of taxoids 
and taxoid biochemistry, see. e.g., Kingston et al., "The Taxane Diierpenoids." Progress in the 
Chemistry of Organic Natural Products, vol. 61, Springer Verlag. New York. 1993. pp. 1-206 

The committed cyclization step of the target pathway is a slow step in the extended 
biosynihetic sequence leading to Taxol and i^lated taxoids (Koepp ei al., J. Biol. Chem. 270:8686- 
8690. 1995: Hezari et al.. Arch. Biochem. Biophys. 322:437-444, 1995). The yield of Taxol and 
related taxoids {e.g., cephalomannine. baccatins. taxinines. among others) in cells of an organism 
capable of taxo.d biosynthesis is increased by the expression in such cells of a recombinant taxadiene 
20 synthase gene. 

This approach to increasing taxoid biosynthesis can be used in any organism thai is capable 
of taxoid biosymhesis. Taxol synthesis is known to take place, for example, in the Taxaceae. 
including Taxus species from all over the world (including, but not limited to, T. brevi/olia. T 
baccata. T. x media, T. cuspidata. T. canadensis, and T. chinensis), as well as in cenain 
microorganisms. Taxol may also be produced by a fungus, Taxomyces andreanae (Siierle et al.. 
Science 260:214. 1993). 

Agrobaaerium tumefaciens-vasAmcA transformation of Taxus species has been described and 
the resulting callus cultures shown to produce Taxol (Han et al.. Plant Sci. 95:187-196. 1994). 

Taxol can be isolated from cells transformed with the taxadiene synthase gene by 
conventional methods. The production of callus and suspension cultures of Taxus, and the isolation 
of Taxol and related compounds from such cultures, has been described (for example, in Fett-Netto 
ei al., Bio/Technology 10:1572-1575, 1992). 

Biosynthesis of taxoids in microorpanisms . As discussed below, taxadiene synthase activity 
was observed in transformed E. coli host cells expressing recombinant taxadiene synthase. Taxadiene 
synthase does not require extensive post-translaiional modification, as provided, for example, in 
mammalian cells, for enzymatic function. As a result, functional taxadiene synthase can be expressed 
in a wide variety of host cells. 

Geranylgeranyl diphosphate, a substrate of taxadiene symhase, is produced in a wide variety 
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of organisms, including bacteria and yeast that synthesize carotenoid pigments (e,g., Serratia spp. 
and Rhodotorula spp.). Introduction of vectors capable of expressing laxadiene synthase in such 
microorganisms permits the production of large amounts of taxa-4(5), 1 1( 12)-dienc and related 
compounds having ihc laxane backbone. The taxane backbone thus produced is useful as a chemical 
5 feedstock. Simple taxoids, for example, would be useful as perfume fixatives. 

Cloning taxadiene sy nthase homotoes and related genes . The availability of the laxadiene 
synthase gene from Pacific yew makes possible the cloning of homologs of taxadiene synthase from 
other organisms capable of taxoid biosynthesis, panicularly Taxus spp. Although the proportion of 
common taxoids varies with the species or cultivar of yew tested, apparently all Taxus species 
10 synthesize taxoids. including Taxol. to some degree {see, e.g. . Mauina and Palva. J. Environ. Hon. 
10: 187-191 . 1992; Miller. J. Natural Products 43:425-437, 1980). Taxol may also be produced by 
a fungus, Taxomyces andreanae (Stierle et at.. Science 260:214, 1993). 

A laxadiene synthase gene can be isolated from any organism capable of producing Taxol 
or related taxoids by using primers or probes based on the Pacific yew laxadiene synthase gene 
15 sequence or antibodies specific for taxadiene synthase by conventional methods. 

Modified forms of t axadiene synthase gene and polypeptide . Knowledge of the taxadiene 
synthase gene sequence permits the modification of the sequence, as described more fully below, to 
produce variant forms of the gene and the polypeptide gene product. For example, the plastidial 
transit peptide can be removed and/or replaced by other transit peptides to allow the gene product 
20 to be directed to various intracellular compartments or exported from a host cell. 

DEFINITIONS AND METHODS 

The following definitions and methods are provided to better define the present invention 
and to guide those of ordinary skill in the an in the practice of the present invention. Definitions of 
25 common terms in molecular biology may also be found in Rieger et al, , Glossary of Genetics: 
Classical and Molecular, 5th edition. Springer- Verlag, New York, 1991; and Lewin, Genes V\ 
Oxford University Press, New York. 1994. 

The term "plant" encompasses any plant and progeny thereof. The term also encompasses 
pans of plants, including seed, cuttings, tubers, fruit, flowers, etc. 

A "reproductive unit" of a plant is any totipotent part or tissue of the plant from which one 
can obtain a progeny of the plant, including, for example, seeds, cuttings, buds, bulbs, somatic 
embryos, culiured cell (e,g., callus or suspension cultures), etc. 

Nucleic Acids 

35 Nucleic acids (a term used interchangeably with "polynucleotides" herein) that are useful in 

the practice of the present invention include the isolated taxadiene synthase gene, its homologs in 
other plant species, and fragments and variants thereof. 

The term "taxadiene synthase gene" refers to a nucleic acid that contains a iaxa-4(5). I U12j- 
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diene synthase sequence, preferably a nucleic acid that encodes a polypeptide having taxadicne 
synthase enzymatic activity. This term relates primarily to the isolated hill-length taxadicne synthase 
cDNA from Pacific yew discussed above and shown in FIG. 2 and the corresponding genomic 
sequence (including flanJcing or infernal sequences operably linked thereto, including regulatory 
elements and/or intron sequences). 

This term also encompasses alleles of the taxadicne synthase gene from Pacific yew. 
INativcI. The term "native" refers to a naturally-occurring ("wild-type") nucleic acid or 
polypeptide. 

"HomoloR" . A "homolog" of the taxadicne synthase gene is a gene sequence encoding a 
taxadiene synthase isolated from an organism other than Pacific yew. 

•Isolated" . An "isolated" nucleic acid is one that has been substantially separated or purified 
away from other nucleic acid sequences in the cell of the organism in which the nucleic acid naturally 
occurs, i.e. . other chromosomal and extrachromosomal DNA and RNA. by conventional nucleic acid- 
purification methods. The term also embraces recombinant nucleic acids and chemically synthesized 
15 nucleic acids. 

FraRments. probes, and pfimcrs . A fragment of a taxadiene synthase nucleic acid according 
to the present invention is a ponion of the nucleic acid thai is less than full-length and comprises at 
least a minimum length capable of hybridizing specifically with the taxadiene synthase nucleic acid 
of Figure 2 under stringent hybridization conditions. The length of such a fragment is preferably 15- 
20 17 nucleotides or more. 

Nucleic acid probes and primers can be prepared based on the taxadiene synthase gene 
sequence provided in FIG. 2. A "probe" is an isolated DNA or RNA attached to a detectable label 
or reporter molecule, e.g., a radioactive isotope, ligand, chemiluminescent agent, or enzyme 
"Primers" are isolated nucleic acids, generally DNA oligonucleotides 15 nucleotides or more in 
25 length, that are annealed to a complementary target DNA strand by nucleic acid hybridization to form 
a hybrid between the primer and the target DNA strand, then extended along the target DNA strand 
by a polymerase, e.g. , a DNA polymerase. Primer pairs can be used for amplification of a nucleic 
acid sequence, e.g.. by the polymerase chain reaction (PCR) or other conventional nucleic-acid 
amplification methods. 

Methods for preparing and using probes and primers are described, for example, in 
Sambrook ei at.. Molecular Cloning: A Laboratory Manual, 2nd cd.. vol. 

1-3, ed. Sambrook et at.. Cold Spring Harbor Laboratory Press. Cold Spring Harbor. NY, 1989; 
Current Protocols in Molecular Biology, ed. Ausubel el al., Greene Publishing and Wiley- 
Interscience. New York. 1987 (with periodic updates); and Innis et al.. PCR Protocols. A Guide to 
Methods and Applications, Academic Press: San Diego. 1990. PCR-primer pairs can be derived 
from a known sequence, for example, by using computer programs mtended for that purpose such 
as Primer (Version 0.5. ® 1991. Whitehead Institute for Biomedical Research. Cambridge. MA). 
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Nucleotide sequence similarirv . Nucleotide sequence "similarity" is a measure of the degree 
to which two polynucleotide sequences have identical nucleotide bases at corresponding positions in 
their sequence when optimally aligned (with appropriate nucleotide insertions or deletions) . Sequence 
similarity can be determined using sequence analysis software such as the Sequence Analysis 
Software Package of the Genetics Computer Group. University of Wisconsin Biotechnology Center. 
Madison. WI. Preferably, a variant form of a taxadiene synthase polynucleotide has at least 70%. 
more preferably at least 80%. and most preferably at least 90% nucleotide sequence similarity with 
a native taxadiene synthase gene, particularly with a native Pacific yew taxadiene synthase, as 
provided in FIG. 2. 

Opcrably linked . A first nucleic-acid sequence is "operably" linked with a second nucleic- 
acid sequence when the first nucleic-acid sequence is placed in a functional relationship with the 
second nucleic-acid sequence. For instance, a promoter is operably linked to a coding sequence if 
the promoter affects the transcription or expression of the coding sequence. Generally, operably 
linked DNA sequences are contiguous and. where necessary to join two protein coding regions, in 
15 reading frame. 

"Recombinant" . A "recombinant" nucleic acid is an isolated polypeptide made by an 
artificial combination of two otherwise separated segments of sequence, e.g., by chemical synthesis 
or by the manipulation of isolated segments of nucleic acids by genetic engineering techniques. 

Techniques for nucleic-acid manipulation are described generally in. for example, Sambrook 
20 el al. (1989) and Ausubel et al. (1987, with periodic updates). Methods for chemical synthesis of 
nucleic acids are discussed, for example, in Beaucage and Carruthers, Tetra. Letts. 22:1859-1862. 
1981. and Matteucci et al.,J. Am. Chem. Soc. 103:3185. 1981. Chemical synthesis of nucleic acids 
can be performed, for example, on commercial automated oligonucleotide synthesizers 

Preparation of recombinant or chemically synthesized nucleic acid.v venors. tran.sfnrm^r.nn 
hosLcells. Natural or synthetic nucleic acids according to the present invention can be incorporated 
into recombinant nucleic-acid constructs, typically DNA constructs, capable of introduction into and 
replication in a host cell. Such a construct preferably is a vector that includes a replication system 
and sequences that are capable of transcription and translation of a polypeptide-encoding sequence 
in a given host cell. For the practice of the present invention, conventional compositions and 
30 methods for preparing and using vectors and host cells are employed, as discussed, inter alia, in 
Sambrook et al., 1989. or Ausubel et al., 1987. 

A 'transformed" or "transgenic" cell, tissue, organ, or organism is one into which a foreign 
nucleic acid, has been imroduced. A "transgenic" or "transformed" cell or organism also includes 
(1) progeny of the cell or organism and (2) progeny produced from a breeding program employing 
a "transgenic" plant as a parent in a cross and exhibiting an altered phenotype resulting from the 
presence of the "transgene," i.e.. the recombinant taxadiene synthase nucleic acid. 

Nucleic-Acid Hybridization "Stringent Conditi o ns": "St^ecifir" The nucleic-acid probes 
and primers of the present invention hybridize under stringent conditions to a target DNA sequence. 
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e.g. . to the taxadienc synthase gene. 

The term "stringent conditions" is ftinctionally defined with regard to the hybridization of 
a nuclcic-acd probe to a target nucleic acid (i.e.. to a particular nucleic-acid sequence of interest) 
by (he hybridization procedure discussed in Sambrook et al., 1989. at 9.52-9.55. See also. 
Sambrook e, al.. 1989 at 9,47-9.52. 9.56-9.58; Kanehisa. Nuci. Acids Res. 12:203-213. 1984; and 
Wetmur and Davidson, J. Mol. Biol. 31:349-370, 1968. 

Regarding the amplification of a target nucleic- acid sequence (e.g.. by PCR) using a 
particular amplification primer pair, stringent conditions are conditions that permit the primer pair 
to hybridize only to the target nucleic-acid sequence to which a primer having the correspondmg 
wild-type sequence (or its complement) would bind and preferably to produce a unique amplification 
product. 

The term "specific for (a target sequence)" indicates that a probe or primer hybridizes under 
stringent conditions only to the target sequence in a sample comprising the target sequence. 

Nucleic-acid amplification . As used herein, "amplified DNA" refers to the product of 
nucleic-acid amplification of a target nucleic-acid sequence. Nucleic-acid amplification can be 
accomplished by any of the various nucleic-acid amplification methods known in the art. including 
the polymerase chain reaction (PCR). A variety of amplification methods are known in the art and 
are described, inter alia, in U.S. Patent Nos. 4.683.195 and 4,683.202 and in PCR Protocols. A 
Guide to Methods and Applications, Innis et al.. eds.. Academic Press. San Diego, 1990. 

Methods of making cDNA clones encodin g taxadiene svntha.se or homoloes thereof Based 
upon the availability of the taxadiene synthase cDNA as disclosed herein, other taxadiene synthase 
genes (e.g. , alleles and homologs of taxadiene synthase) can be readily obtained from a wide variety 
of plants by cloning methods known in the art. 

One or more primer pairs based on the taxadienc synthase sequence can be used to amplifj- 
such taxadiene synthase genes or their homologs by the polymerase chain reaction (PCR) or other 
convemional amplification methods. Alternatively, the disclosed taxadiene synthase cDNA or 
fragments thereof can be used to probe a cDNA or genomic library made from a given plant species 
by conventional methods. 

CloninR of the taxadiene synthase genomic sequence and homologs Thereof The availability 
of the taxadiene synthase cDNA sequence enables those skilled in the art to obtain a genomic clone 
corresponding to the taxadiene synthase cDNA (including the promoter and other regulatory regions 
and intron sequences) and the determination of its nucleotide sequence by conventional methods. 

Vinually all Taxus species synthesize taxoids, including Taxol. to some degree (see. e.g., 
Mattina and Palva. J. Environ. Hon. 10:187-191. 1992; Miller, J. Natural Products 43:425-437, 
1980). Any organism that produces taxoids would be expected to express a homolog of taxadienc 
synthase. Taxadienc symhase genes can be obtained by hybridization of a Pacific yew taxadiene 
synthase probe to a cDNA or genomic library of a target species. Such a homolog can also be 
obteined by PCR or other amplification method from genomic DNA or RNA of a target species using 
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primers based on the taxadienc synthase sequence shown in FIG. 2. Genomic and cDNA libraries 
from yew or other plant species can be prepared by conveniional methods. 

Primers and probes based on the sequence shown in FIG. 2 can be used to confirm (and, 
if necessary, to correct) the taxadienc synthase sequence by conventional methods. 

Nucleotid e-Sequence Variants of taxadiene synthase cDNA and Amino Acid Sequence 
Variants of taxadien c synthase Protein . Using the nucleotide and the amino-acid sequence of the 
taxadiene synthase protein disclosed herein, those skilled in the an can create DNA molecules and 
polypeptides that have minor variations in their nucleotide or amino acid sequence. 

"Variant" DNA molecules are DNA molecules containing minor changes in the native 
taxadiene synthase sequence, i.e.. changes in which one or more nucleotides of a native taxadiene 
synthase sequence is deleted, added, and/or substituted, preferably while substantially maintaining 
taxadiene synthase activity. Variant DNA molecules can be produced, for example, by standard 
DNA mutagenesis techniques or by chemically synthesizing the variant DNA molecule or a ponion 
thereof. Such variants preferably do not change the reading frame of ihe protein-coding region of 
the nucleic acid and preferably encode a protein having no change, only a minor reduction, or an 
increase in taxadiene synthase biological function. 

Amino-acid substitutions are preferably substitutions of single amino-acid residues. DNA 
insenions are preferably of about 1 to 10 contiguous nucleotides and deletions are preferably of about 
I to 30 contiguous nucleotides. Insertions and deletions are preferably insenions or deletions from 
an end of the protein-coding or non-coding sequence and are preferably made in adjacent base pairs. 
Substitutions, deletions, insenions or any combination thereof can be combined to arrive at a final 
construct. 

Preferably, variant nucleic acids according to the present invention are "silent" or 
"conservative" variants. "Silent" variants are variants of a native taxadiene synthase sequence or a 
homolog thereof in which there has been a substitution of one or more base pairs but no change in 
the amino-acid sequence of the polypeptide encoded by the sequence. "Conservative" variants are 
variants of the native taxadiene synthase sequence or a homolog thereof in which at least one codon 
in the protein-coding region of the gene has been changed, resulting in a conservative change in one 
or more amino acid residues of the polypeptide encoded by the nucleic-acid sequence, i.e. . an amino 
acid substitution. A number of conservative amino acid substitutions are listed below. In addition, 
one or more codons encoding cysteine residues can be substituted for, resulting in a loss of a cysteine 
residue and affecting disulfide linkages in the taxadiene synthase polypeptide. 
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Ala 


Ser 


Arc 


Lys 


Asn 


Gin, His 


Asp 


Glu 




Ser 


VJlll 


Asn 


Glu 


Asp 


Gly 


Pro 


His 


Asn; Gin 


He 


Leu, Val 


Leu 


lie; Val 


Lys 


Arg; Gin; Glu 


Met 


Leu; lie 


Phc 


Met; Leu; Tyr 


Ser 


Thr 


Thr 


Ser 


Tip 


Tyr 


Tyr 


Trp; Phe 


Val 


He; Leu 



Substantial changes in hinction are made by selecting substitutions that are less conservative 
than those listed above, e.g,. causing changes in: (a) the structure of the polypeptide backbone in 
the area of the substitution; (b) the charge or hydrophobiciiy of the polypeptide at the target site; or 
(c) the bulk of an amino acid side chain. Substitutions generally expected to produce the greatest 
changes in protein properties arc those in which: (a) a hydrophilic residue, e.g.^ seryl or ihreonyi. 
is substituted for (or by) a hydrophobic residue, e.g. , leucyL isoleucyl, phenyJalanyl, valyi or alanyl; 
(b) a cysteine or proline is substituted for (or by) any other residue; (c) a residue having an 
electropositive side chain. e,g. . lysyl, arginyl, or histadyL is substituted for (or by) an electronegative 
residue, e.g,. glutamyl or aspanyl; or (d) a residue having a bulky side chain, e.g.. phenylalanine, 
is substituted for (or by) one not having a side chain, e.g,, glycine. 

The laxadienc synthase gene sequence can be modified as follows: 

( I) To improve expression efficiencv and red irect the targeting of the expressed polvp eptiriP 
For expression in non-plant hosts (or to direct the expressed polypeptide to a different intracellular 
companment in a plant host), the native gene sequence can be truncated from the 5' end to remove 
the sequence encoding the plasiidial transit peptide of approximately 137 amino acids ire., to 
approximately 138S). leaving the sequence encoding the mature taxadienc synthase polypeptide of 
about 725 ammo acids. In addition, one or more codons can be changed, for example, to conform 
the gene to the codon usage bias of the host cell for improved expression. Enzymatic stability can 
be altered by removing or adding one or more cysteine residues, thus removing or adding one or 
more disulfide bonds. 

^2) To alter catalytic efficiency: As discussed below, the aspanate-nch 
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lo play a role in substrate binding, is also present in laxadicnc synthase, as is a related DXXDD motif 
(FIG. 2). Histidinc and cysteine residues have been implicated at the active sites of several terpenoid 
cyclases of plant origin. Histidincs residues 370. 415 and 793 and cysteines at residues 329. 650 and 
777 of laxadiene synthase are conserved among ihc plant terpenoid cyclase genes. 

One or more conserved hisiidine and cysteine residues (as discussed below), or semi- 
conserved residues such as conserved cysteine residues {e.g., residues 329, 650, 719, and 777) and 
histidinc residues {e.g., residues 370, 415, 579. and 793), can be mutagenized to alter enzyme 
kinetics. In addition, residues adjacent to these conserved hisiidine and cysteine residues can also 
be altered to increase the cysteine or histidine content to improve charge stabilization. By increasing 
the aspanaie content of the DDXXD and DXXDD motifs (where D is aspartate and X is any amino 
acid), which are likely to be involved in substrate/intermediate binding, it is also possible to increase 
the enzymatic rate (i.e. , the rate-limiting ionization step of the enzymatic reaction). Arginines have 
been implicated in binding or catalysis, and conserved arginine residues are also good targets for 
mutagenesis. Changing the conserved DDXXD and/or DXXDD motifs {e.g., the aspartate residues 
thereoO by conventional site-directed mutagenesis methods to match those of other known enzymes 
can also lead to changes in the kinetics or substrate specificity of taxadiene synthase. Additionally, 
product formation can be altered by mutagenesis of the RWWK element (residues 564 to 567). which 
includes aromatic residues that may play a role in stabilizing carbocaiionic reaction intermediates. 

To modify substrate utilization : The enzyme, particularly the active site, can be. 
modified to allow the enzyme to bind shoncr (e.g., C,o) or longer (e.g., C.,) chams than 
geranylgeranyl diphosphate. Substrate size utilization can be altered by increasing or decreasing the 
size of the hydrophobic patches to modify the size of the hydrophobic pocket of the enzyme. Similar 
effects can be achieved by domain swapping. 

^'^) To change product outcome: Directed mutagenesis of conserved aspanatc and arginine 
residues can be used to permit the enzyme to produce different diierpene skeletons with, for example, 
one, two. or three rings. 

See, e,g. . Cane ei aL . Biochemistry 34:2480-2488, 1995: Joly and Edwards. J. Biol, Chem. 
268:26983-26989, 1993; Marrero ei ai.,J. BioL Chem. 267:533-536, 1992; and Song and Poulter. 
Proc. Natl, Acad. Sci. USA 91:3044-3048, 1994). 

Expression of taxadiene synthase n ucleic acids in host cells . DNA constructs incorporating 
a taxadiene synthase gene or fragment thereof according to the present invention preferably place the 
taxadiene synthase protein coding sequence under the control of an opcrably linked promoter thai is 
capable of expression in a host cell. Various promoters suitable for expression of heterologous genes 
in plant cells are known in the art, including constitutive promoters, e.g. the cauliflower mosaic virus 
(CaM V) 35S promoter, which is expressed in many plant tissues, organ- or tissue-specific promoters, 
and promoters that are inducible by chemicals such as methyl Jasminaie, salicylic acid, or safeners. 
for example. A variety of other promoters or other sequences useful in constructing expression 
vectors are available for expression in bacterial, yeast, mammalian, insect, amphibian, avian, or other 
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host cells. 

Nucleic acids attached to a solid Mippon . The nucleic acids of chc present invention can be 
free in solaiion or attached by conventional means to a solid support, such as a hybridization 
membrane (e.g., nitrocellulose or nylon), a bead, or other solid supports known in the art. 

Polypeptides 

The term "taxadiene synthase protein" (or polypeptide) refers to a protein encoded by a 
taxadiene synthase gene, including alleles, homologs. and variants thereof, for example. A taxadiene 
synthase polypeptide can be produced by the expression of a recombinant taxadiene synthase nucleic 
acid or be chemically synthesized. Techniques for chemical synthesis of polypeptides are described, 
for example, in Merrifield. J. Amer. Chetn. Soc. 85:2149-2156, 1963. 

Polypeptide sequence identiiv and stmilari.v Ordinarily, taxadiene synthase polypeptides 
encompassed by the present invention have at least about 70% amino acid sequence "identity" tor 
homology) compared with a native taxadiene synthase polypeptide, preferably at least about 80% 
15 identity, and more preferably at least about 90% identity to a native taxadiene synthase polypeptide. 
Preferably, such polypeptides also possess characteristic structural features and biological activity of 
a native taxadiene synthase polypeptide. 

Amino acid sequence "similarity " is a measure of the degree to which aligned amino acid 
sequences possess idemical amino acids or conservative amino acid substitutions at corresponding 
20 positions. 

A taxadiene synthase "biological activity" includes taxadiene synthase enzymatic activity as 
determined by conventional protocols (e.g., the protocol described in Hezari et al.,Arch. Biochem. 
Biophys. 322:437-444. 1995. incorporated herein by reference). Other biological activities of 
taxadiene synthase include, but are not limited to substrate binding, immunological activity (including 

25 the capacity to elicit the production of antibodies that are specific for taxadiene synthase), etc. 

Polypeptide identity (homology) or similarity is typically analyzed using sequence analysis 
software such as the Sequence Analysis Software Package of the Genetics Computer Group. 
University of Wisconsin Biotechnology Center, Madison. Wl). Polypeptide sequence analysis 
software matches polypeptide sequences using measures of identity assigned to various substitutions, 

30 deletions, substitutions, and other modifications. 

"isolated." • Purified." "Homogeneous- P o lvpentides . A polypeptide is "isolated" if it has 
been separated from the cellular components (nucleic acids, lipids, carbohydrates, and other 
polypeptides) that naturally accompany ii. Such a polypeptide can also be referred to as "pure" or 
"homogeneous" or "substantially" pure or homogeneous. Thus, a polypeptide which is chemically 

35 synthesized or recombinant (i.e. . the product of the expression of a recombinant nucleic acid, even 
if expressed in a homologous cell type) is considered to be isolated. A monomeric polypeptide is 
isolated when at least 60-90% by weight of a sample is composed of the polypeptide, preferably 95 % 
or more, and more preferably more than 99%. Protein purity or homogeneity is indicated, for 
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example, by polyacrylamide gel electrophoresis of a protein sample, followed by visualizaiion of a 
single polypeptide band upon staining the polyacrylamide gel; high pressure liquid chromatography; 
or other conventional methods. 

Protein purification. The polypeptides of the present invention can be purified by any of 
the means known in the art. Various methods of protein purification are described, e.g,, in Guide 
to Protein Purificaiion , ed. Deutschcr, Meth, EnzymoL 185. Academic Press, San Diego, 1990; and 
Scopes, Protein Purification: Principles and Practice, Springer Verlag, New York, 1982. 

Variant forms of tax adiene synthase polvpeptidcs: labeling . Encompassed by the taxadiene 
synthase polypeptides according to an embodiment of the present invention are variant polypeptides 
in which there have been substitutions, deletions, insertions or other modifications of a native 
taxadiene synthase polypeptide. The variants substantially retain structural and/or biological 
characteristics and are preferably silent or conservative substitutions of one or a small number of 
contiguous amino acid residues. Preferably, such variant polypeptides are at least 70%, more 
preferably at least 80%, and most preferably at least 90% homologous to a native taxadiene synthase 
polypeptide. 

The native taxadiene synthase polypeptide sequence can be modified by conventional 
methods, e.g., by acctylation, carboxylation, phosphorylation, glycosylation. ubiquitination. and 
labeling, whether accomplished by in vivo or in vitro enzymatic treatment of a taxadiene synthase 
polypeptide or by the synthesis of a taxadiene synthase polypeptide using modified amino acids. 

There are a variety of conventional methods and reagents for labeling polypeptides and 
fragments thereof Typical labels include radioactive isotopes, iigands or ligand receptors, 
fluorophores, chemiluminescent agents, and enzymes. Methods for labeling and guidance in the 
choice of labels appropriate for various purposes are discussed, e,g.^ in Sambrook et al. (1989) and 
Ausubel et al. (1987 with periodic updates). 

Polypeptide Fragments. The present invention also encompasses fragments of taxadiene 
synthase polypeptides that lack at least one residue of a native full-length taxadiene synthase 
polypeptide yet retain at least one of the biological activities characteristic of taxadiene synthase, e.g. , 
taxadiene synthase enzymatic activity or possession of a characteristic immunological determinant. 
As an additional example, an immunologically active fragment of a taxadiene synthase polypeptide 
is capable of raising taxadiene synihase-specific antibodies in a target immune system (e.g., murine 
or rabbit) or of competing with taxadiene synthase for binding to taxadiene synthase-specific 
antibodies, and is thus useful in immunoassays for the presence of taxadiene synthase polypeptides 
in a biological sample. Such immunologically active fragments typically have a minimum size of 7 
to 17 amino acids. Fragments preferably comprise at least 10, more preferably at least 20. and most 
preferably at least 30 consecutive amino acids of a native taxadiene synthase polypeptide. 

Fusion polypeptides. The present invention also provides fusion polypeptides including, for 
example, heterologous fusion polypeptides, i.e., a taxadiene synthase polypeptide sequence or 
fragment thereof and a heterologous polypeptide sequence, e.g., a sequence from a different 
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polypeptide. Such heterologous fusion polypeptides thus exhibit biological properties (such as iigand- 
binding, catalysis, secretion signals, antigenic determinants, etc.) derived from each of the fused 
sequences. Fusion panners include, for example, immunoglobulins, betagalaciosidase, irpE, protein 
A, beta lactamase, aJpha amylase, alcohol dehydrogenase, yeast alpha mating factor, and various 
5 signal and leader sequences which, e.g.. can direct the secretion of the polypeptide. Fusion 
polypeptides are typically made by the expression of recombinant nucleic acids or by chemical 
synthesis. 

Polypeptide sequence determination . The sequence of a polypeptide of the present mvention 
can be determined by various methods known in the art. In order to determine the sequence of a 
10 polypeptide, the polypeptide is typically fragmented, the fragments separated, and the sequence of 
each fragment determined. To obtain fragments of a taxadiene synthase polypeptide, the polypeptide 
can be digested with an enzyme such as trypsin, clostripain, or Staphylococcus protease, or with 
chemical agents such as cyanogen bromide. 

o-iodosobenzoate, hydroxy lamine or 2-nitro-5-thiocyanoben2oate. Peptide fragments can be 
15 separated, e.g., by reversed -phase high-performance liquid chromatography (HPLC) and analyzed 
by gas-phase sequencing. 

Antibodies 

The present invention also encompasses polyclonal and/or monoclonal antibodies that are 
20 specific for taxadiene synthase, i.e. , bind to taxadiene synthase and are capable of distinguishing the 

taxadiene synthase polypeptide from other polypeptides under standard conditions. Such antibodies 

are produced and assayed by conventional methods. 

For the preparation and use of antibodies according to the present invention, including 

various immunoassay techniques and applications, see, e.g.. Coding, Monoclonal Antibodies: 
25 Principles and Practice, 2d ed. Academic Press, New York, 1986; and Harlow and Lane. Antibodies: 

A Laboratory Manual. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 1988. Taxadiene 

synthase-specific antibodies are useful, for example in: purifying taxadiene synthase polypeptides: 

cloning taxadiene synthase homologs from Pacific yew or other plant species from an expression 

library; antibody probes for protein blots and immunoassays; etc. 

Taxadiene synthase polypeptides and antibodies can be labeled by conventional techniques. 

Suitable labels include radionuclides, enzymes, substrates, cofactors. inhibitors, fluorescent agents. 

chemi luminescent agents, magnetic particles, etc. 

Plant transformatio n and regeneration . Any well-known method can be employed for plant 

eel! transformation, culture, and regeneration can be employed in the practice of the present 
35 mvention. Methods for introduction of foreign DNA into plant cells include, but are not limited to: 

transfer mvolving the use of Agrobacterium tumefaciens and appropriate Ti vectors, including binar>' 

vectors; chemically induced transfer {e.g., with polyethylene glycol); biolistics; and microinjection. 

See. e.g.. An et al.. Plant Molecular Biology Manual A3: 1-19. 1988. 
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The invention will be better understood by reference to the following Examples, which are 
intended to merely illustrate the best mode now known for practicing the invention. The scope of 
the invention is not to be considered limited thereto, however. 

EXAMPLE 1 : Cloning and Seque ncing of a cDNA encoding Taxa-4(5). 1 U12)-diene Synthase 

Materials and Methods 
Plains. Substrates, and Standards. Four-year-old T. brevifolia saplings in active growth 
were maintained in a greenhouse, f l-'HlGeranylgeranyl diphosphate (120 Ci/mol) was prepared as 
described previously (UFever et aL, Arch. Biochem, Biophys, 313:139-149, 1994), and authentic 
(±)-iaxa-4(5),l 1(12)-diene was prepared by total synthesis (Rubenstein, / Org. Cheni, 60:7215- 
7223, 1995). 

Library Construction. Total RNA was extracted from T. brevifolia stem using the 
procedures of Lewinsohn and associates (Lewinsohn et aL, Plant MoL Biol. Rep. 12:20-25. 1991) 
developed for woody gymnosperm tissue. Poly(A)^ RNA was purified by chromatography on 
15 oligo(dT)-cellulose (Pharmacia) and 

5 ^g of the resulting mRNA^was utilized to construct a XZAP II cDNA library according to the 
manufacturer's instructions (Stratagene). 

PCR-Based Probe Generation and Library Screening. Comparison of six available sequences 
for monoterpene, sesquiterpene, and diterpene cyclases from higher plants (Facchini and Chappell, 
Proc, Natl. Acad. Sci. USA 89:1 1088-1 1092, 1992; Colby et aL.J. Biol. Chem. 268:23016 23024. 
1993; Mau and West, Proc. NatL Acad. Sci. USA 91:8497-8501 . 1994; Back and Chappell, J. Biol. 
Chem. 270:7375-7381, 1995; Sun and Karmiya, Plant Cell 6:1509 1518. 1994; and Bensen et al.. 
Plant Cell 7:75-84, 1995) allowed definition of eleven homologous regions for which consensus 
degenerate primers were synthesized. All twenty primers (the most carboxy terminal primer, the 
25 most amino terminal primer, and nine internal primers in both directions) were deployed in all 
possible combinations with a broad range of amplification conditions using CsCl-purified T. brevifolia 
stem library phage DNA as template (Innis and Gelfand, in PCR Protocols (Innis et al., eds), pp. 
3-12, 253-258, Academic Press. San Diego, CA, 1990; Sambrook et al., 1989). 

Analysis of PCR products by gel electrophoresis (Sambrook et at. , 1989) indicated that only 
the combination of primers CC7.2F and CC3R (see FIG. 2) generated a specific DNA fragmem 
(-80 bp). This DNA fragment was cloned into pT7BJuc (Novagen) and sequenced (DyeDeoxy 
Terminator Cycle Sequencing, Applied Biosystems). and shown to be 83 bp in length. PCR was used 
to prepare approximately I ^xg of this material for random hexamer labeling with fa-"PJdATP (Tabor 
et al.^ in Current Protocols in Molecular Biology, Ausubcl et aL, Sections 3.5.9-3,5.10, 1987) and 
35 use as a hybridization probe to screen filter lifts of 3 x 10^ plaques grown in E. colt LE392 using 
standard protocols (Britten and Davidson, in Nucleic Acid Hybridisation, Hames and Higgins, eds.. 
pp. 3-14, IRL Press, Oxford, 1988). 

Of the plaques affording positive signals (102 total). 50 were purified through two additional 
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cycles of hybridization. Thiny -eight pure clones were m vivo excised as Bluescript phagemids. The 
insert size was determined by PGR using T3 and T7 promoter primers, and the twelve largest clones 
(insen > 2 kb) were panially sequenced. 

cDNA Expression in E. coli. All. of the panially sequenced, ftjll-length insens were either 
out of frame or bore premature stop sites immediately upstream of the presumptive methionine start 
codon. The latter complication likely resulted from hairpin-primed second-strand cDNA synthesis 
(Old and Primrose, Principles of Gene Manipulation. 4th ed., pp. 3435. Blackwell Scientific. 
London. 1989). The 2.7-kb insen from pTb42 was cloned into frame by PGR using the 
thermostable, high fidelity, blunting polymerase Pful (Straiagene) and the FRM42 primer 
(downstream of false stop codons) and T7 promoter primer. The resulting blunt fragment was ligated 
into EcoRV-digested pBluescript SK(-) (Stratagene). yielding pTb42. 1. and transformed into E. col, 
XLl-Blue (Stratagene). 

To evaluate functional expression of terpene cyclase activity. E. coli XLl-Blue cells 
harboring pTb42.1 were grown (to = 0.4) on 5 ml LB medium supplemented with 100 ^g/ml 
ampicillin and 12.5 Mg/ml tetracycline before induction with 200 IPTG and subsequent growth 
for 4 h at 25^. Bacteria were harvested by centrifugation ( 1800g. 10 min). resuspended in taxadiene 
synthase assay buffer (Hezari et al. . Arch. Biochem. Biophys. 322:437-444. 1995). disnipted by brief 
sonication at 0-4°G, and the resulting suspension centriftjged (IS.OOOg, 10 min) to pellet debris. The 
supernatant was assayed for taxadiene synthase activity by an established protocol (Hezari et al.. 
Arch. Biochem. Biophys. 322:437-444, 1995) in the presence of 15 mM H-'HJgeranylgcranyl 
diphosphate and 1 mM MgCI,. with incubation at 31 "C for 4 h. The reaction products were extracted 
with pemane and the extract purified by column chromatography on silica gel as previously described 
(Hezari et aL.Arch. Biochem. Biophys. 322:437-444. 1995) to afford the olefin fraction, an aliquot 
of which was counted by liquid scintillation spectrometry to determine 'H incorporation. Control 
experiments with transformed E. coli bearing the plasmid with out-of-frame insens were also carried 
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The identity of the olefin product of the recombinant enzyme was verified by capillary radio- 
gas chromatography ("capillary radio-GC") (Croteau and Satierwhite, J. Chromatogr. 500:349-354, 
1990) as well as capillary gas chromatography-mass spectrum/ spectrometry ("capillary GC-MS") 
using methods described previously (Koepp et al., J. Biol. Chem. 270:8686-8690. 1995) and 
authentic taxa-4(5). 1 l(12)-diene (Rubenstein. J. Org. Chem. 60:7215-7223. 1995). For GG-MS 
analysis (Hewlett-Packard 6890 GC-MSD). selected diagnostic ions were monitored: miz 272 fP*]; 
257 |P--15(CH,)]; 229 IP--43(C,H,)]; 121. 122. 123 [C-ring fragmem clusierl; and 107 \mlz 122 
base peak - 15(GH5)|. The origin of the highly characteristic C-ring double cleavage fragmem ion 
[base peak, m/z 122(C,H,4>J has been described (Koepp a/. , 7. Biol. Chem. 270:8686-8690. 1995). 
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RESULTS AND DISCUSSION 

cDNA Isolation and Charaaerization. In general characierisiics (molecular weight, divalent 
mciaJ ion requirement, kinetic constants, etc.), taxadicne synthase resembles other terpenoid cyclases 
fronn higher plants: however, the low tissue liters of the enzyme and its instability under a broad 
5 range of fractionating conditions impeded purification of the protein to homogeneity (Hezari et al. , 
Arch. Biochem, Biophys. 322:437-444, 1995). A 10 /xg sample of the elcctrophorctically-purified 
cyclase, prepared by standard analytical procedures (Schagger and von Jagow, Anal. Biochem. 
166:368-379. 1987; Towbin et al. , Proc. Natl. Acad. Set, USA 76:350-4354. 1979), failed to provide 
amino-tcrminal sequence via Edman degradation. Repeated attempts at trypsinization and CNBr 
10 cleavage of comparable protein samples also failed to provide sequenceable peptides, in large part 
because of very low recoveries. 

As an alternate approach to cDNA library screening using protein-based oligonucleotide 
probes, a PCR-based strategy was developed that was founded on a set of degenerate primers for 
PCR amplification designed to recognize highly -conserved regions of six higher plant lerpene cyclases 
15 whose nucleotide sequences are known. Three of these cyclases, (-)-limonene synthase (a 
monotcrpcnc cyclase from spearmint) (Colby et aL. J Biol. Chem. 268:23016-23024, 1993). epi- 
arisiolochene synthase (a sesquiterpene cyclase from tobacco) (Facchini and Chappell, Proc. NatL 
Acad. Sci. USA 89: n088>11092, 1992; Back and Chappell, 7. Biol. Chem. 270:7375-7381, 1995). 
and casbene synthase (a diterpene cyclase from castor bean) (Mau and West, Proc, Natl. Acad. Sci. 
20 USA 91:8497-8501, 1994), exploit reaction mechanisms similar to taxadiene synthase in the 
cyclizaiion of the respective geranyl (C,o). famesyl (C15), and geranylgeranyl (C.o) diphosphate 
substrates (Lin et al,. Biochemistry, in press). Kaurene synthase A from Arabidopsis ihaliana (Sun 
and Karmiya, Plant Cell 6:1509-1518, 1994) and maize (Bensen et al.. Plant Cell 7:75-M, 1995) 
and (-)-abietadiene synthase from grand fir (Abies grandis: Siofer Vogel, Wildung. Vogel, and 
25 Croteau, manuscript in preparation) exploit a quite different mechanism that involves protonation of 
the terminal double bond of geranylgeranyl diphosphate to initiate cyclization to the intermediate 
copalyl diphosphate followed, in the case of abietadiene synthase, by the more typical ionization of 
the diphosphate ester function to initiate a second cyclization sequence to the product olefin (LaFever 
et al.. Arch. Biochem. Biophys. 313:139-149, 1994). The latter represents the only gymnosperm 
30 lerpene cyclase sequence presently available. 

Comparison of deduced amino acid sequences between all of the cyclases targeted eleven 
regions for PCR primer construction. Testing of all twenty primers in all combinations under a 
broad range of amplification conditions, followed by product analysis by gel electrophoresis, revealed 
that only one combination of primers (CC7.2 (forward) with CC3 (reverse), see FIG, 2 for locations) 
35 yielded a specific DNA fragment (83 bp) using T. brevifolia library phage as template. Primer CC3 
delineates a region of strong homology between (-)-limonene synthase (Colby et al.^J. Biol. Chem. 
268:23016-23024, 1993), ^/^/-ariscolochene synthase (Facchini and Chappell, Proc. NatL Acad. Sci. 
USA 89:11088-11092, 1992) and casbene synthase (Mau and West. Proc. NatL Acad. Sci. USA 
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91 :8497-850l , 1994). Primer CC7.2 was selected based on sequence comparison of ihe angiosperm 
diierpene cyclases (Mau and West, Proc. Natl. Acad. ScL USA 91:8497-8501. 1994; Sun and 
Karmiya, Plant Cell 6:1509-1518, 1994; Bcnsen ei aL. Plant Cell 7:75-84. 1995) lo the recently 
acquired cDNA clone encoding a gymnosperm diterpene cyclase, (-)-abictadiene synthase from grand 
fir (Stofer Vogel, Wildung, Vogel. and Croteau, manuscript in preparation). 

The 83 bp fragment was cloned and sequenced, and thus demonstrated to be cyclase-likc. 
This PGR product was ^^P-labeled for use as a hybridization probe and employed in high stringency 
screening of 3 x 10^ plaques which yielded 102 positive signals. Fifty of these clones were purified 
through two additional rounds of screening, in vivo excised and the inserts sized. The twelve clones 
bearing the largest inserts (> 2.0 kb) were partially sequenced, indicating that they were all 
representations of the same gene. Four of these insens appeared to be full-length. 

cDNA Expression in E. coli. All four of the full-length clones that were purified were out 
of frame or had stop sites immediately upstream of the sianing methionine codon resulting from 
hairpin-primed second strand cDNA synthesis. The insert from pTb42 was cloned into frame by 
PGR methods, the blunt fragment was ligaied into the Eco/?V-site of pBluescript SK(-). yielding 
pTb42.1, and transformed into £. coli XLl-Blue. 

Transformed E. coli were grown in LB medium supplemented with antibiotics and induced 
with IPTG, The cells were harvested and homogenized, and the extracts were assayed for taxadicne 
synthase activity usmg standard protocols with \ 1 -^H]geranylgeranyl diphosphate as substrate (Hezari 
et aL. Arch. Biochem. Biophys. 322:437-444, 1995). The olefin fraction isolated from the reaction 
mixture contained a radioactive product ( 1 nmol) that was coincident on capillary radio-GC with 
authentic taxa-4(5), I l(12)-diene (/?/ = 19.40 ± 0.13 min). 

The identification of this diierpene olefin was confirmed by capillary GG-MS analysis The 
retention time (12.73 min. vs. 12.72 min.) and selected ion mass spectrum (Table I) of the diterpene 
olefin product was identical to that of authentic (±)-taxa-4(5), I l(I2)-diene (Rubenstem, J. Org. 
Chem, 60:7215-7223. 1995). The origin of the selected diagnostic ions shown in Table I. which 
account for most of fiiU spectrum abundance, are described herein and elsewhere (Koepp et ai, J. 
Biol. Chem. 270:8686-8690, 1995). Because of different sample sizes, the total abundance of the 
authentic standard (2.96 E^) was approximately twice that of the biosynthetic olefin (1.42 E^. This, 
and variation in background between runs, probably account for minor differences in relative 
abundances of the high mass fragments. 
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Table 1 : GC-MS Analysis of the Diicrpene Olefin Synthesized bv Recombinant Taxadiene Synthase 
("Product ") Compared to Authentic Taxa-4(5)J in2)-diene ("Standard") 

5 

Relative Abundance (%) 





m/z 


Product 


Standard 


10 


107 


15.3 


15.3 




121 


14.3 


14.3 




J 22 


58.1 


57.8 




123 


10.2 


10.3 




229 


0.56 


0.71 


15 


257 


0.35 


0.45 




272 


1.19 


1.17 



Since identically prepared extracts of conirol cultures of £. coli thai were transformed with 

20 pBluescripi bearing an out-of-frame insert were incapable of transforming geranylgeranyl diphosphate 
to detectable levels of diterpene olefin, these results confirm that clone pTb42. 1 encodes the taxadiene 
synthase from Pacific yew. 

Sequence Analysis. Both strands of the insens from pTb42 and pTb42. 1 were sequenced. 
No mistakes were incorporated by Pfu polymerase. The pTb42. 1 taxadiene synthase cDNA is 2700 

25 nucleotides in length and contains a complete open reading frame of 2586 nucleotides (FIG. 2). The 
deduced amino acid sequence indicates the presence of a putative plastidial transit peptide of 
approximately 137 amino acids and a mature protein of about 725 residues (-82.5 kDa), based on 
the size of the native (mature) enzyme ( -79 kDa) as estimated by gel permeation chromatography 
and sodium dodecyl sulfate- polyacrylamide gel electrophoresis ("SDS-PAGE") (Hezari et at . Arch. 

30 Biochem. Biophys. 322:437-444, 1995), the characteristic amino acid content and structural features 
of such aminoierminal targeting sequences, and their cleavage sites (Keegstra et al. , Annu. Rev Plant 
Physiol. Plant MoL BioL 40:47 1'SOi, 1989; von Heijne a/. , £wr. 7. Biochem. 180:535-545, 1989), 
and the fact that diterpene biosynthesis is localized exclusively within plastids (West ei al. , Rec. Adv. 
Phytochem. 13:163-198, 1979; Kleinig, /l/inw. Rev. Plant Physiol. Plant Mot. Biol. 40:39-59, 1989). 

35 The transit peptide/mature protein junction and thus the exact lengths of both moieties are unknown, 
because the amino terminus of the mature protein is apparently blocked and has not yet been 
identified. 

Pairwise sequence comparison (Feng and Doolitile, Methods Enz\moL 183:375-387. 1990; 
Genetics Computer Group, Program Manual for the Wisconsin Packet, Version 8, Genetics Computer 
40 Group. Madison. WI, 1994) with other lerpene cyclases from higher plants revealed a significant 
degree of sequence similarity at the amino acid level. The taxadiene synthase from yew showed 32% 
identity and 55% similarity to (-)-limonene synthase from spearmint (Colby et al, J. Biol. Chem. 
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268:23016-23024. 1993). 30% identity and 54% similarity to e/^z-aristolochene synthase from tobacco 
(Facchini and Chappell. Proc. Nail. Acad. Sci. USA 89: 1 1088-1 1092, 1992), 3 1 % idemity and 56% 
similarity to casbene synthase from castor bean (Mau and West, Proc. Natl. Acad. Sci. USA 91 :8497- 
8501. 1994). and 33% identity and 56% similarity to kaurene synthase A horn Arabidopsis thaliana 
and maize (Sun and Karmiya, Plant Cell 6:1509-1518. 1994; Bensen et al.. Plant Cell 7:75-84. 
1995). and 45% identity and 67% simiianiy to (-)-abieiadiene synthase from grand fir (Stofer Vogel. 
Wildung. Vogel. and Croieau. manuscript in preparation). Pairv/ise comparison of other members 
within this group show roughly comparable levels of identity (30-40%) and similarity (50-60%). 
These terpenoid symhascs represent a broad range of cyclase types from diverse plant families, 
supporting the suggestion of a common ancestry for this class of enzymes (Colby ei al. . J. Biol. 
Chem. 268:23016-23024. 1993; Mau and West. Proc. Natl. Acad Sci. USA 91:8497-8501. 1994; 
Back and Chappell. J. Biol. Chem. 270:7375-7381 . 1995; McGarvey andCroteau. Plant Cell 7: 1015- 
1026, 1995; Chappell, Annu. Rev. Plant Physiol. Plant Mol. Biol. 46:521-547. 1995) 

The amino acid sequence of taxadiene synthase does not closely resemble (identity -20%: 
similarity -40%) that of any of the microbial sesquiterpene cyclases that have been determined 
recently (Hohn and Bcremand, Gene (Amsi.) 79:131-136. 1989; Proctor and Hohn. J. Biol. Chem. 
268:4543-4548. 1993; Cane et al.. Biochemistry 33:5846-5857. 1994). nor does the taxadiene 
synthase sequence resemble any of the published sequences for prcnyltransferases (Chen et al.. 
Protein Set. 3:600-607. 1994; Scolnik and Bartley. Plant Physiol. 104:1469-1470. 1994; Attucci et 
al. Arch. Biochem. Biophys. 321:493-500, 1995). a group of enzymes that, like the terpenoid 
cyclases, employ allylic diphosphate substrates and exploit similar electrophilic reaction mechanisms 
(Poulter and Rilling, in Biosynthesis of Isoprenoid Compounds. Poner and Spurgeon. eds., vol. 1. 
pp. 161-224. Wiley & Sons. New York, NY. 1981). The aspanaie-rich (l.L.V)XDDXX(XX>D 
motif(s) found in most prcnyltransferases and terpenoid cyclases (Facchini and Chappell, Proc. Natl. 
Acad. Sci. USA 89:11088-11092. 1992; Colby etal.,J. Biol. Chem. 268:23016-23024. 1993; Mau 
and West, Proc. Natl. Acad Sci. USA 91:8497-8501. 1994; Back and Chappell. J. Biol. Chem. 
270:7375-7381. 1995; Hohn and Beremand. Gene (Amst.) 79:131-136. 1989; Proctor and Hohn. J. 
Biol. Chem. 268:4543-4548. 1993; Cane et al.. Biochemistry 33:5846-5857, 1994; Chen et al.. 
Protein Sci. 3:600-607, 1994; Scolnik and Hartley. Plant Physiol. 104:1469-1470. 1994; Attucci et 
al. Arch. Biochem. Biophys. 321:493-500. 1995; Abe and Prestwich. J. Biol. Chem. 269:802-804. 
1994). and thought to play a role in substrate binding (Chen et al., Protem Sci. 3:600-607. 19^4; 
Abe and Prestwich. J. Biol. Chem. 269:802-804, 1994; Marrero et al.. J. Btvl Chem. 267 2187.V 
21878. 1992; Joly and Edwards. J. Biol. Chem. 268:26983-26989. 1993; Tar.shis ct al.. Biochemistry^ 
33:10871-10877, 1994). is also present in taxadiene synthase, as is a related DXXDD motif (FIG. 
2). Histidine and cysteine residues have been implicated at the active sites of several terpenoid 
cyclases of plant origin (Rajaonarivony et al. . Arch. Biochem. Biophys. 299:77-82. 1992; Savage et 
aL.Arch. Biochem. Biophys. 320:257-265. 1995). A search of the aligned sequences revealed that 
three histidines (at positions 370. 415 and 793) and three cysteines (at positions 329, 650 and 777) 
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of laxadiene synthase are conserved among the plant terpenoid cyclase genes. The taxadiene synthase 
from yew most closely resembles the abieiadiene synthase from grand fir rather than the casbcru; 
synthase from castor bean (Mau and West, Proc. Nail. Acad. 5a. USA 91 :8497-8501 . 1994), which 
catalyzes a similar type of cyclization reaction but is phylogcnctically quite distant. The abietadicne 
5 synthase from grand fir is the only other terpenoid cyclase sequence from a gymnosperm now 
available (Siofer Vogel, Wildung, Vogel. and Croieau, in preparation), and these two diicrpcne 
cyclases from the coniferales share several regions of significant sequence homology, one of which 
was fortuitously chosen for primer construction and proved to be instrumental in the acquisition of 
a PCR-derived probe that led to the cloning of taxadiene synthase. 

10 

EXAMPLE 3: Expre ssion of Taxadiene Synthase Genes Truncated to Remove Transit Peptide 
Sequences 

The native taxadiene synthase gene sequence was truncated from the 5* end to removing pan 
or all of the sequence that encodes the plastidiai transit peptide of approximately 1 37 amino acids (the 

15 mature taxadiene synthase polypeptide is about 725 ammo acids. ) Deletion mutants were produced 
that remove amino acid residues from the amino terminus up to residue 31 (Glu), 39 (Ser), 49 (Ser), 
54 (Gly), 79 (Val). or 82 (lie). These mutants were expressed in E. coli calls and cell extracts were 
assayed for taxadiene synthase activity as described above, in preliminary experiments, expression 
of truncation mutants up was increased over wild-type taxadiene synthase by up to about 50% , with 

20 further truncation past residues 83-84 apparently decreasing taxadiene synthase activity. 

Truncation of at least part of the plastidiai transit peptide improves taxadiene synthase 
expression. Moreover, removing this sequence improves purification of taxadiene synthase, since 
the transit peptide is recognized by £. coli chaperonins. which co-purify with the enzyme and 
complicate purification, and because the taxadiene synthase preprotein tends to form inclusion bodies 

25 when expressed in E. coli. 

The actual cleavage site for removal of the transit peptide may not be at the predicted 
cleavage site between residue 136 (Scr) and residue 137 (Pro). A transit peptide of 136 residues 
appears quite long, and other (monoterpene) synthases have a tandem pair of arginines (Arg-Arg) at 
about residue 60 (Met). Truncation immediately amino-ierminal to the tandem pair of arginines of 

30 these synthases has resulted in excellent expression in E. coli. Taxadiene synthase lacks an Arg-Arg 
element- Also, truncation beyond residues 83-84 leads to lower activity. 

This invention has been detailed both by example and by direct description. It should be 
apparent that one having ordinary skill in the relevant art would be able to surmise equivalents to the 
invention as described in the claims which follow but which would be within the spirit of the 

35 foregoing description. Those equivalents are to be included within the scope of this invention. 
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WHAT IS CLAIMED IS: 

1. An isolated polynucleotide comprising at least 15 consecutive nucleotides of a native 
laxadiene synthase gene. 

2. An isolated polynucleotide comprising at least 30 consecutive nucleotides of a native 
5 taxadiene synthase gene. 

3. The poiynucleoiide of claim 1 comprising a sequence thai encodes a polypeptide having 
taxadiene synthase biological activity. 

4. A cell comprising the poiynucleoiide of claim 1. 

5. A plant cell comprising the polynucleotide of claim 1. 

10 6- A transgenic plant comprising the polynucleotide of claim 1 . 

7. An isolated polynucleotide comprising a polypeptide-encoding sequence that encodes a 
polypeptide with taxadiene synthase biological activity, wherein the polynucleotide-encoding sequence 
has at least 70% nucleotide sequence similarity with a native Pacific yew taxadiene synthase gene. 

8. The polynucleotide of claim 7 wherein the polypeptide-encoding sequence has at leasi 
15 80% nucleotide sequence similarity with the native Pacific yew taxadiene synthase gene. 

9. The polynucleotide of claim 8 wherein the polypeptide-encoding sequence has at least 
90% nucleotide sequence similarity with the native Pacific yew taxadiene synthase gene. 

10. The polynucleotide of claim 7 wherein the polypeptide-encoding sequence encodes a 
polypeptide having only conservative amino acid substitutions to the taxadiene synthase polypeptide 

20 sequence of FIG. 2 except for an amino acid substitution at at least one location selected from the 
group consisting of: cysteine residues 329. 650. 719. and 777; hisiidine residues 370, 415. 579, and 
793; a DDXXD motif; a DXXDD motif; a conserved arginine; and a RWWK element. 

11. The polynucleotide of claim 7 wherein the polypeptide-encoding sequence encodes a 
polypeptide having only conservative amino acid substitutions to a native Pacific yew taxadiene 

25 synthase polypeptide sequence. 

12. The polynucleotide of claim 7 wherein the polypeptide-encoding sequence encodes a 
polypeptide that lacks at least part of a transit peptide sequence of a native Pacific yew taxadiene 
synthase polypeptide. 

13. The polynucleotide of claim 7 wherein the polypeptide-encoding sequence encodes a 
30 polypeptide that is completely homologous with a native taxadiene synthase polypeptide. 

14. A cell comprising the polynucleotide of claim 7. 

15. A plant cell comprising the polynucleotide of claim 7. 

16 A transgenic plant comprising the polynucleotide of claim 7. 
17. An isolated polypeptide having taxadiene synthase activity. 

^8. The polypeptide of claim 17 having at least 70% amino acid sequence identity with a 
native Pacific yew taxadiene synthase polypeptide. 

19. The polypeptide of claim 18 having at least 80% amino acid sequence identity with the 
laxadiene synthase polypeptide. 
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20. The polypeptide of claim 19 having ai least 90% amino acid sequence identity with the 
taxadiene synthase polypeptide. 

2 L The polypeptide of claim 1 7 having only consei^ative substitutions to the native Pacific 
yew taxadiene synthase polypeptide sequence except for an amino acid substitution at one or more 
locations of the native Pacific yew taxadiene synthase polypeptide sequence that are selected from 
the group consisting of: cysteine residues 329. 650. 719, and 777; histidine residues 370, 415, 579. 
and 793; a DDXXD motif; a DXXDD motif; a conserved arginine; and a RWWK element. 

22. The polypeptide of claim 17 having only conservative amino acid substitutions to the 
native Pacific yew taxadiene synthase polypeptide sequence. 

23. The polypeptide of claim 17 that is completely homologous with the native Pacific yew 
taxadiene synthase polypeptide sequence. 

24. The polypeptide of claim 17 lacking part or all of a transit peptide. 

25. An isolated polypeptide comprising at least 10 consecutive amino acids of a native 
Pacific yew taxadiene synthase polypeptide. 

26. An isolated mature native Pacific yew taxadiene synthase polypeptide. 

27 An antibody specific for a native Pacific yew taxadiene synthase polypeptide. 

28 A method of expressing a taxadiene synthase polypeptide in a cell, the method 
comprising the steps of: 

providing a cell thai comprises an expressible polynucleotide that encodes a taxadiene 
synthase polypeptide according to claim 17; and 

culiuring the cell under conditions suitable for expression of the polypeptide. 

29. The method of claim 28 wherein the cell is a laxoid-producing celL 

30. The method of claim 29 wherein expression of the polynucleotide causes the cell to 
produce a higher level of a taxoid than an otherwise similar cell that lacks the expressible 
polynucleotide. 

31. A method of obtaining a taxadiene synthase gene comprising the steps of: 
contacting a nucleic acid of a taxoid -producing organism with a probe or primer comprising 

a polynucleotide of claim 1 under stringent hybridization conditions, thereby causing the probe or 
primer to hybridize to a taxadiene synthase gene of the organism; 

and isolating the taxadiene synthase gene of the organism. 
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FIGURE 1 
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TTCCCCTGCCTCTCTGGAGAA 2 1 

ATGGCTCAGCTCTCATTTAATGCAGCGCTGAAGATGAACGCATTGGGG 6 9 

MAQLSFNAA LKMMALG 
5 10 15 

AACAAGGCAATCCACGATCCAACGAATTGCAGAGCCAAATCTGAGCGC 117 
NKAIKDPTNCRAKS E^R 
20 25 30 

CAAATGATGTGGGTTTGCTCCAGATCAGGGCGAACCAGAGTAAAAATG 16 5 

QMMWVCS^RSGRTRV KM 
35 40 45 

TCGAGAGGAAGTGGTGGTCCTGGTCCTGTCGTAATGATGAGCAGCAGC 213 
StRGSGG^PGPVVMMS SS 
50 55 60 

ACTGGCACTAGCAAGGTGGTTTCCGAGACTTCCAGTACCATTGTGGAT 2 61 

TGTSKVVSETSST I VtD 
65 70 73 80 

GATATCCCTCGACTCTCCGCCAATTATCATGGCGATCTGTGGCACCAC 3 0 9 

DITPRLSANYHGDLWHH 
85 90 95 

AATGTTATACAAACTCTGGAGACACCGTTTCGTGAGAGTTCTACTTAC 3 57 

NVIQTLETPFRESSTY 
100 105 110 

CAAGAACGGGCAGATGAGCTGGTTGTGAAAATT AAAG AT ATGTTC AAT 4 05 

QERADELVVKIKDMFN 
115 120 125 

GCGCTCGGAGACGGAGATATCAGTCCGTCTGCATACGACACTGCGTGG 4 53 

ALGDGDI SvPSAYDTAW 
130 135 140 

GTGGCGAGGCTGGCGACCATTTCCTCTGATGGATCTGAGAAGCCACGG 501 

VARLATISSDGSEKPR 

145 150 155 160 

TTTCCTCAGGCCCTCAACTGGGTTTTCAACAACCAGCTCCAGGATGGA 54 9 

FPQALNWVFNNQLQDG 
165 170 175 

TCGTGGGGTATCGAATCGCACTTTAGTTTATGCGATCG ATTGCTT AAC 5 97 

SWGIESHFSLCDRLLN 
180 185 190 



FIGURE 2-1 
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ACGACCAATTCTGTTATCGCCCTCTCGGTTTGGAAAACAGGGCACAGC 64 5 

TTNSVIALSVWKTGKS 
195 200 205 

CAAGTACAACAAGGTGCTGAGTTTATTGCAGAGAATCTAAGATTACTC 693 
QVQQGAEFIAENLRLL 
210 215 220 

AATGAGGAAGATGAGTTGTCCCCGGATTTCCAAATAATCTTTCCTGCT 
NEEDELSPDFQI IFPA 
225 . 230 235 240 

CTGCTGCAAAAGGCAAAAGCGTTGGGGATCAATCTTCCTTACGATCTT 78 9 

LLQKAKALGINLPYDL 
245 250 255 

CCATTTATCAAATATTTGTCGACAACACGGGAAGCCAGGCTTACAGAT 8 3 7 

P F ^ KYLSTTRE ARLTD 
260 265 270 

GTTTCTGCGGCAGCAGACAATATTCCAGCCAACATGTTGAATGCGTTG 88 5 

VSAAADN'I PANMLNAL 
275 280 285 

GAAGGTCTCGAGGAAGTTATTGACTGGAACAAGATTATGAGGTTTCAA 93 3 

EGLEEVIDWNKIMRc-Q 
290 295 300 

AGTAAAGATGGATCTTTCCTGAGCTCCCCTGCCTCCACTGCCTGTGTA 9 8 ^ 

SrCDGSFLSSPASTACV 

305 310 315 320 

CTGATGAATACAGGGGACGAAAAATGTTTCACTTTTCTCAACAATCTG 102 9 
LtMNTGDEK[c]FTFLNNL 
325 330 335 

CTCGACAAATTCGGCGGCTGCGTGCCCTGTATGTATTCCATCGATCTG 1077 

LDKFGGCVPCMYSIDL 
340 345 

CTGGAACGCCTTTCGCTGGTTGATAACATTGAGCATCTCGGAATCGGT " 125 
LERLSLVDNIEHLGIG 
355 360 365 

CGCCATTTCAAACAAGAAATCAAAGGAGCTCTTGATTATGTCTACAGA 1173 

R[h]fkqeikgaldyvyr 

370 375 380 



FIGURE 2-2 
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CATTGGAGTGAAAGGGGCATCGGTTGGGGCAGAGACAGCCTTGTTCCA 12 21 

HWSERGIGWGRDSLVP 

385 390 395 400 

GATGTCAACACCACAGCCCTCGGCCTGCGAACTCTTCGCATGCAGGGA 12 6 9 
DLMTTALGLRTLRMfHlG 
405 410 415 

TACAATGTTTGTTCAGACGTTTTGAATAATTTCAAAGATGAAAA CGGG 1317 
YNVSSDVLNNFKDENG 
420 425 430 

CGGTTCTTCTCCTCTGCGGGCCAAACCCATGTCGAATTGAGAAGCGTG 13 6 5 
RFFSSAGQTHVELRSV 
435 440 445 

GTGAATCTTTTCAGAGCTTCCGACCTTGCATTTCCTGACGAAAGAGCT 1413 
VNLFRASDLAFPDERA 
450 455 460 

ATGGACGATGCTAGAAAATTTGCAGAACCATATCTTAGAGAGGCACTT 14 61 
MDDAR KFAEPYLREAL 
465 470 475 480 

GCAACGAAAATCTCAACCAATACAAAACTATTCAAAG AGATTGAGTAC 15 0 9 
ATKISTNTKLFKEIEY 
485 490 495 

GTGGTGGAGTACCCTTGGCACATGAGTATCCCACGCTTAGAAGCCAGA 15 5 7 
VVEYPWHMSIPRLEAR 
500 505 510 

AGTTATATTGATTCATATGACGACAATTATGTATGGCAGAGGAAGACT 16 0 5 
3YIDSYDDNYVWQRKT 
515 520 525 

CTATATAGAATGCCATCTTTGAGTAATTCAAAATGTTTA GAATTGGCA 16 5 3 
LYRMPSLSNSKCLELA 
530 535 540 

AAATTGGACTTCAAT ATCGTACAATCTTTGCATCAAGAnr;AnTTr;AAr: 17 01 
KLDFN IVQSLHQEELK 
545 550 555 560 

CTTCTAAC AAGATGGTGGAAGGAAT CCGGCATGGCAGATATAAA— : 7 4 ^ 
L L T |R W W K| E S G M A D I N F 
565 570 575 



FIGURE 2-3 
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ACTCGACACCGAGTGGCGGAGGTTTATTTTTCATCAGCTACATTTGAA 1797 
TRHRVAEVYFSSATFE 
580 585 590 

CCCGAATATTCTGCCACTAGAATTGCCTTCACAAAAATTGGTTGTTTA 184 5 
PEYSATRIAFTKIGCL 
595 600 605 

CAAGTCCTTTTTGATGATATGGCTGACATCTTTGC AACACTAG ATGAA 18 93 
QVLFDDMADI FATLDE 
610 . 615 620 

TTGAAAAGTTTCACTGAGGGAGTAAAGAG ATGGG ATACATCTTTGCTA 1941 

lksftegvkrwdtsll ' 

630 635 640 

CATGAGATTCCAGAGTGTATGCAAACTTGCTTTAAAGTTTGGTTCAAA 198 9 
H E I P E C M Q T [C] F K V W f' K 
645 650 655 

TTAATGGAAGAAGTAAATAATGATGTGGTTAAGGTACAAGGACGTGAC 20 3 7 

lmeevnndvvkvqgrd 

660 665 670 

ATGCTCGCTCACATAAG AAAACCCTGGGAGTTGTACTTC AATTGTTAT 9 08 5 

mlahirkpwelyfncy' " 

675 680 685 

GTACAAGAAAGGGAGTGGCTTGAAGCCGGGTATAT ACCAACTTTTGAA 213 3 
VQEREWLEAGYI PTF E 
690 695 700 

GAGTACTTAAAGACTTATGCTATATCAGTAGGCCTTGGACCGTGTACC 2181 
EYLKTYAISVGLGPC^ 

710 715 ■ ' ^ 720 

CTACAACCAATACTACTAATGGGTGAGCTTGTGAAAGATGATGT-TGTT 22 2 9 

LQPILLMGELVKDDV V 
725 730 

GAGAAAGTGCACTATCCCTCAAATATGTTTGAGCTTGTATCCTTGAGC 22 77 
cKVHYPSNMFELVSLS 
740 745 750 

TGGCGACTAACAAACGACACCAAAACATATCAGGCTGAAAAGGCTCGA 2 3 2 5 
WRLTNDTKTYQAEKAR 
755 760 765 
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GGACAACAAGCCTCAGGCATAGCATGCTATATGAAGGATAATCCAGGA 2 3 73 
GQQASGI a[c]YMKDN PG 
770 775 780 

f 

GCAACTGAGGAAGATGCCATTAAGCACATATGTCGTGTTGTTGATCGG 2 4 21 
A TEEDAI KrHllCRVVDR 
785 790 795 800 

GCCTTGAAAGAAGCAAGCTTTGAATATTTCAAACCATCCAATGATATC 24 6 9 
ALKEASFEYFKPSNDI 
805 810 815 

CCAATGGGTTGCAAGTCCTTTATTTTTAACCTTAGATTGTGTGTCCAA 2 517 
PMGCKSF I FNLRLCVQ 
820 825 830 

ATCTTTTACAAGTTTATAGATGGGTACGG AATCGCCAATGAGG AGATT 2 5 6 5 
IFYKFIDGYGIANEEI 
835 840 845 

AAGGACTATATAAGAAAAGTTTATATTGATCCAATTCAAGTATGA 2 610 

KDYIRKV-YIDPIQV* 
850 855 860 

TATATCATGTAAAACCTCTTTTTCATGATAAATTGACTTATTATTGTA 2 6 58 
TTGGC AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 2 7 00 
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